4 Myths About Taxonomies – Joseph Busch spoke at the ASIST PVC meeting on 2/10/2004


The second ASIST PVC Taxonomy meeting introduced Joseph A. Busch , an energetic speaker who describes himself as a technology specialist and content management evangelist.  He is currently on the Board of Directors of one of his particular interests, the Dublin Core set of metadata, which, according to Busch, was developed to catalog websites not books.  He was president of ASIST in 2001.


Metadata, however, is not the same thing as taxonomy though Busch has seen much confusion between the two terms. 


A taxonomy is also not the same thing as a thesaurus though they have many similarities.  Thesauri develop out of key terms contained in documents or in a body of knowledge.  They are controlled vocabularies.  Taxonomies on the other hand, are the backbone of a website.  They govern how a website is structured and organized.  They are also controlled vocabularies but are drawn from the subject matter of the website itself as a means to organize the content of the website.


Busch says taxonomies are created to be able to find the right information on a website, at the right time, to solve whatever problem is at hand.  Information gets created and then it gets classified.  Taxonomies are ways of classifying information.  An office catalog doesn’t list every pencil.  It lists office supplies at the highest level, and then you drill down to writing instruments and then to the type of writing instrument, maybe mechanical pencils, which may have a further breakdown into types of lead, with eraser or without, metal casing or plastic, etc.  The top level of the catalog provides some organizing principles which allow you to begin your investigation of information on a site.


Office supplies


                   Writing instruments

                             |                  |

                   graphite pencil            mechanical pencil

                                                            |                       |

                                                            plastic             metal________

                                                |           |                       |                       |

                                    with eraser     without            with eraser     without

                                    |           |           |           |           |           |           |           |

                                    .5         .7         .5         .7         .5         .7         .5         .7


Facet analysis on controlled vocabularies (CV) of taxonomy terminology is a way to derive taxonomies.  Dividing CV’s into facet groupings means you have fewer branches to navigate, and the resultant organization is easier to maintain.  Also many questions can be answered by just a very few queries.  Organizing the information helps users get at the information more quickly.  In the example above, the office supplies are the family of items, the writing instruments are the class of items and the commodities, in this case mechanical pencils, are further broken down into Types.


          Office supplies                                             Family of items


                   Writing instruments                                    Class of items

                             |                  |

                   graphite pencil            mechanical pencil                   Type of items

                                                            |                       |

                                                            plastic             metal_             Casing node of type

                                                |           |                       |

                                    with eraser     without            with e              Eraser node of type

                                    |           |           |           |           |          

                                    .5         .7         .5         .7         .5                     Lead node of type


These aren’t (Myth #1) huge hierarchies.  They are structured hierarchies based on a careful arrangement of terms into a few facets with a small number of nodes beneath them.  Nodes can be expanded for more information.  Instead of searching on 10,000 nodes (i.e. the total number of writing instruments in the inventory at the warehouse) the searcher learns to navigate within the facets and nodes to find what they need.


Myth #2 is the idea that people retrieve content by topical subjects.  They usually don’t.  They look for key words, ideas that relate to what they are looking for.  They may not know exactly what subject they want or how to articulate their information need.  Again it is better to break the content into facets and arrange by facets rather than arrange by subject matter.  If users can drill up and down in the organization of the website they will find what they want more quickly than by trying to guess at subject matter.  In addition a large percentage of questions usually fit into a much smaller number of queries and general facets can help users answer the questions they need to answer quickly.


The 3rd myth is that nobody else can index content.  Busch thinks any effort toward indexing is better than no effort and also that dirty metadata is better than no metadata at all.  Perhaps just anybody can’t do a great job but any job is better than none.


There are 4 rules to help with indexing content that can be taught to anyone:

1.          The Specificity Rule i.e. specific terms can always be generalized but generalized terms can’t be specified easily.

2.          Make everything repeatable – its better to repeat than to miss something;

3.          Keep the attributes sensible and appropriate.  For example different groups look at concepts different ways.  People have different viewpoints and different meanings for words that sound the same.  Pilots, for example, are interested in their equipment, their schedules, some navigation information; while NASA has a completely different set of needs for similar terms..  Think about the word “entrée”.  In a local restaurant it means the main course.  To a French tourist it means the appetizer, the beginning of the meal.  One size fits all does not work. 

4.          Think about usability.  Think about how an item will be searched and how people will find it.  What is the orientation of your average user?  How will they address the material in the site.

The 4th myth is that search engines do nothing but produce lists.  There’s a lot of other content produced in a search.  Searches can uncover links to new ideas and tie in with other taxonomies.  The search engine is a tool, not just a list maker.  It uncovers relationships.


Busch doesn’t think the automated tools that index documents are very useful.  At present some work well for general categorization within uncomplicated sets of terms, but most websites are too complicated for such tools.


When building taxonomies one should look for the broader shallower view rather than the narrower deeper view though one shouldn’t look for a black box solution i.e. a magic solution, a one size fits all taxonomy.


In an organization, taxonomies should be owned by the CIO rather than the library.  It should be pushed up the chain of command so it will get attention. 


 More information from this lecture can be found in the Powerpoint slides from Busch’s lecture in the links below. 


You can also find a very good bibliography on taxonomies and content management at www.taxonomystrategies.com.