4 Myths About Taxonomies –
The
second ASIST PVC Taxonomy meeting introduced
Metadata,
however, is not the same thing as taxonomy though Busch has seen much confusion
between the two terms.
A
taxonomy is also not the same thing as a thesaurus though they have many
similarities. Thesauri develop out of
key terms contained in documents or in a body of knowledge. They are controlled vocabularies. Taxonomies on the other hand, are the
backbone of a website. They govern how a
website is structured and organized.
They are also controlled vocabularies but are drawn from the subject
matter of the website itself as a means to organize the content of the website.
Busch says taxonomies are created
to be able to find the right information on a website, at the right time, to
solve whatever problem is at hand. Information gets
created and then it gets classified.
Taxonomies are ways of classifying information. An office catalog doesn’t list every
pencil. It lists office supplies at the
highest level, and then you drill down to writing instruments and then to the
type of writing instrument, maybe mechanical pencils, which may have a further
breakdown into types of lead, with eraser or without, metal casing or plastic,
etc. The top level of the catalog
provides some organizing principles which allow you to begin your investigation
of information on a site.
Office supplies
|
Writing
instruments
| |
graphite pencil mechanical
pencil
| |
plastic metal________
| | | |
with
eraser without with eraser without
| | | | | | | |
.5 .7 .5 .7 .5 .7 .5 .7
Facet
analysis on controlled vocabularies (CV) of taxonomy terminology is a way to
derive taxonomies. Dividing CV’s into
facet groupings means you have fewer branches to navigate, and the resultant
organization is easier to maintain. Also
many questions can be answered by just a very few queries. Organizing the information helps users get at
the information more quickly. In the
example above, the office supplies are the family of items, the writing
instruments are the class of items and the commodities, in this case mechanical
pencils, are further broken down into Types.
Office supplies Family of items
|
Writing
instruments Class of items
| |
graphite pencil mechanical
pencil Type of items
| |
plastic metal_ Casing node of type
| | |
with
eraser without with e Eraser node of type
| | | | |
.5 .7 .5 .7 .5 Lead
node of type
These aren’t (Myth #1) huge
hierarchies. They are structured hierarchies based on a
careful arrangement of terms into a few facets with a small number of nodes
beneath them. Nodes can be expanded for
more information. Instead of searching
on 10,000 nodes (i.e. the total number of writing instruments in the inventory
at the warehouse) the searcher learns to navigate within the facets and nodes
to find what they need.
Myth #2 is the idea that people
retrieve content by topical subjects. They usually
don’t. They look for key words, ideas
that relate to what they are looking for.
They may not know exactly what subject they want or how to articulate
their information need. Again it is
better to break the content into facets and arrange by facets rather than
arrange by subject matter. If users can
drill up and down in the organization of the website they will find what they
want more quickly than by trying to guess at subject matter. In addition a large percentage of questions
usually fit into a much smaller number of queries and general facets can help
users answer the questions they need to answer quickly.
The 3rd myth is that nobody else can index content. Busch thinks any effort toward indexing
is better than no effort and also that dirty metadata is better than no
metadata at all. Perhaps just anybody
can’t do a great job but any job is better than none.
There are
4 rules to help with indexing content that can be taught to anyone:
1.
The Specificity
Rule i.e. specific terms can always be generalized but generalized terms
can’t be specified easily.
2.
Make everything repeatable – its better to repeat than to miss something;
3.
Keep the attributes sensible and appropriate. For example different groups look at concepts
different ways. People have different
viewpoints and different meanings for words that sound the same. Pilots, for example, are interested in their
equipment, their schedules, some navigation information; while NASA has a
completely different set of needs for similar terms.. Think about the word “entrée”. In a local restaurant it means the main
course. To a French tourist it means the
appetizer, the beginning of the meal. One
size fits all does not work.
4.
Think about usability. Think about how
an item will be searched and how people will find it. What is the orientation of your average
user? How will they address the material
in the site.
The 4th myth is that search engines do
nothing but produce lists. There’s a
lot of other content produced in a search.
Searches can uncover links to new ideas and tie in with other
taxonomies. The search engine is a tool,
not just a list maker. It uncovers
relationships.
Busch
doesn’t think the automated tools that index documents are very useful. At present some work well for general
categorization within uncomplicated sets of terms, but most websites are too
complicated for such tools.
When
building taxonomies one should look for the broader shallower view rather than
the narrower deeper view though one shouldn’t look for a black box solution
i.e. a magic solution, a one size fits all taxonomy.
In an
organization, taxonomies should be owned by the CIO rather than the
library. It should be pushed up the
chain of command so it will get attention.
More information from this lecture can be
found in the Powerpoint slides from Busch’s lecture in the links below.
You can
also find a very good bibliography on taxonomies and content management at www.taxonomystrategies.com.