B  U  L  L  E  T  I  N

of the American Society for Information Science and Technology   Vol. 31, No. 6   August/September 2005

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore


Using Software to Teach Thesaurus Development and Indexing in Graduate Programs of LIS and IAKM

by Marcia Lei Zeng

Marcia Lei Zeng is with the School of Library and Information Science at Kent State University, Kent, OH 44242; email: mzeng@kent.edu. This article was originally developed as a presentation for the Data Harmony User’s Conference held in Albuquerque in January 2005.

The School of Library and Information Science (SLIS) at Kent State University offers the only ALA-accredited MLIS (Master of Library and Information Science) program in the state of Ohio . It is also the home of a new interdisciplinary program – information architecture and knowledge management (IAKM). To obtain the master of science degree, IAKM students must take the core program and elect one of the three concentrations: information architecture, information use or knowledge management. The IAKM program prepares information professionals to develop and manage information interfaces, products, systems and services for specific information ecologies.

I teach two courses related to thesaurus development and indexing. Indexing and Abstracting (LIS60649) covers manual and machine-aided indexing, creation of various types of traditional indexes, thesaurus construction and creation of sitemaps and site indexes. Knowledge Organization Systems: Taxonomy, Thesaurus, Ontology (IAKM60002) is a core course for IAKM students and a special topic course for LIS students.

Figure 1 shows the types of knowledge organization systems ( KOS ), arranged according to the degree of controls introduced (from natural language to controlled language) and the strength of their semantic structure (from weakly structured to strongly structured). It represents my visualized summarization of the Taxonomy of Knowledge Organization Sources/Systems (http://nkos.slis.kent.edu/KOS_taxonomy.htm) adopted by the NKOS (Networked Knowledge Organization Systems/Services) group based on Gail Hodge’s article on KOS (www.clir.org/pubs/abstract/pub91abst.html). The marks in the figure indicate that assignments have been included in my IAKM class for the students to use and/or construct these types of KOS .

Students must understand not only how to construct KOS but also how to use KOS at various tasks, ranging from indexing, browsing and retrieval to knowledge management and information architecture design. Cases dealing with KOS applications are presented to let students connect what they are learning to the real world. These cases include government agency indexing systems, intranets, search engines, digital collections, digital learning environment, international trading and collaborative projects.

Several software programs have been introduced to the classes. They include the following:

In the next section, I will compare the pros and cons of using some of them.

Pros and Cons of KOS Construction Software Used in Classes

Thesaurus Management System. Several years ago, I designed and created a software program with the assistance of a programmer. It has been available on the Web. I have used this software to check students’ manually created thesauri. Students have used it to create their thesauri and to merge thesauri. The software allows online editing, including defining and changing terms and term relationships (facet/category, second language term, scope note and relationships such as USE, UF, BT, NT, RT, TT and CC). The strengths and weaknesses of the Thesaurus Management System follow:


  • It is an open source.

  • It is Web-based, allowing anyone to access it from anywhere at anytime.

  • In addition to online editing, it is also designed for cross-thesaurus searching; therefore, it could be used for thesaurus merging and mapping.

  • The established or imported candidate terms are automatically stored and provided in the editing process through the editing template’s pull-down lists.


  • It still has bugs when inputting long scope notes.

  • It cannot process some special characters.

  • There is no protection on the products. A thesaurus could be changed or deleted by anyone who enters the system’s website.

  • No statistics (such as word frequency) are available.

  • The output is limited to the traditional thesaurus format, no XML (eXtensible Markup Language), RDF (Resource Development Framework, an XML schema) or OWL (Web Ontology Language) output.

  • The experience has not been linked with any indexing process.

Protégé. Protégé is an open-source ontology editor and knowledge-base framework. The tool allows users to construct domain ontologies, customize data entry forms and enter data. It is also a platform that can easily be extended to include graphical components, media and various storage formats such as OWL, RDF, XML and HTML. Products like Protégé can be used to host thesauri as well, as illustrated by Wielinga's work, which employed Art and Architecture Thesaurus and Visual Resource Association (VRA) Core Metadata Categories (www.cs.vu.nl/~guus/papers/Wielinga01a.pdf). In my class, ontology creation is taught after students have learned different types of KOS , semantic relationships, encoding systems and metadata schemas. Teaching basic Protégé in my class has been a pleasant experience. The strengths and weaknesses of Protégé follow:


  • It is an open source.

  • It allows defining both a taxonomy structure and attributes of the classes.

  • It provides rules to control the relationships among the classes.

  • It allows including instances and, thus, leads to building a whole knowledge base.

  • The newer version provides more output options, such as OWL, RDF Schema, XML and HTML (Hypertext Markup Language).

  • Any components (such as a class and its slots) are re-usable.

Weaknesses (for thesaurus use):

  • It is not related to documents/text indexing and processing.

  • It is usually used in small domains.

  • It tends to be more easily applied to “things” or concrete concepts, not to higher level and abstract concepts.

MAIstro. Data Harmony has been providing me with demo and full versions of MAIstro for several years. In the beginning I used it to demonstrate how a thesaurus could be built with this software and how machine-aided subject indexing could be performed based on a thesaurus. Later we were given permission to install the full software package on our server and install the administrator and client parts on all 25 desktops of our electronic teaching classroom. With the software installed, each student was able to create his/her own thesaurus using the thesaurus master component and also to test the thesaurus against selected documents through the machine-aided indexer component of the software.

MAIstro allows for editing the term record (related terms, non-preferred terms, scope notes, editor’s notes, facet and history) through a user-friendly interface. A specific portion of a thesaurus can be separately stored or printed. This portion could be any major thesaurus component, such as the hierarchical display, all term records (the typical printed thesaurus main body), permuted display or alphabetical index.

In addition, the software makes it possible to store and use candidate terms, to show word frequencies and to trace the deleted terms. These three functions are the most useful maintenance and management methods. The records can be stored in various formats including XML, HTML, MARC and a newly added OWL format. In general, MAIstro enables the ANSI/NISO-Z39.19 compliant creation and maintenance of thesauri, allows users to create a vocabulary or import one from external sources and employs a knowledge-based-indexing process.

MAIstro has the following strengths and weaknesses in a teaching environment:


  • It provides the option of building custom thesauri directly or importing from the documents prepared by students in previous assignments such as a thesaurus vocabulary building assignment.

  • It allows a human editor to create, edit, and review rules for the use of indexing terms.

  • It is an easy-to-learn and easy-to-use tool once the students get into the Client.

  • It provides students with the experience of creating, managing, and maintaining a deliverable product.

  • It is free for educational purposes.

Weaknesses (in a teaching environment):

  • Because we are using it in a multiple desktop classroom, not in the dedicated user environment in which MAIstro is typically used, we had to discuss installation options with MAIstro staff.

  • The instructor needs to ask for technical assistance from school’s network administrator.

  • There is a process for installing the administrative and client portion of the software in the instruction station and all classroom desktops that usually requires a significant amount of time from the technical staff.

  • There is an issue of how to store the projects and to allow students to work on the project continually in a public-accessible environment.

  • The instructor needs to re-write step-by-step instructions, especially on how to run a project in the school’s environment.

  • Students can only access the software from the classroom. This limitation could be a problem for those who work fulltime and only come to school on evenings and weekends.

  • Currently we can only run the software on one server with all projects saved under the same directory. There is an issue of how to manage and protect the projects when all 25 students can access the server and run any of the projects.


The access to both open sources and established commercial products during a student’s study is invaluable. Without the support of Data Harmony, it would probably never have been possible for the school to have access to a $60,000 software package, not to mention obtaining multiple copies for our classes. Data Harmony also provided prompt and professional support for any questions we had. The developer is considering our suggestions for further enhancement. I believe that the use of the software by a large group of student users provided useful feedback to the developer.

The impact of using MAIstro and other software in teaching and training is significant. After completing this course, students are able to recognize the importance of vocabulary control and knowledge organization, understand the relationships between vocabulary control and indexing, and understand user warrant and literary warrant principles and apply the knowledge to larger content management tasks.

For Further Reading

Hodge, G. (2000, April). Systems of knowledge organization for digital libraries: Beyond traditional authority files. CLIR Pub91. Washington , D.C. : Council on Library and Information Resources. Accessed June 10, 2005 at www.clir.org/pubs/abstract/pub91abst.html.

National Aeronautics and Space Administration, Scientific and Technical Information Program. NASA thesaurus machine aided indexing (MAI). (software). Accessed June 10, 2005, at http://mai.larc.nasa.gov/

Protėgė ontology editor and knowledge acquisition system. Accessed June 10, 2005, at http://protege.stanford.edu/.

World Wide Web Consortium (W3C). (2004). Web Ontology Language (OWL). Accessed June 11, 2005, at www.w3.org/2004/OWL/.

World Wide Web Consortium (W3C). (2004). Resource Descripton Framework (RDF). Accessed June 11, 2005, at www.w3.org/RDF/.

Wielinga, B., Schreiber, G., Wielemaker, J., & Sandberg, J.A.C. From thesaurus to ontology. International Conference on Knowledge Capture, Victoria, Canada, October 2001. Accessed June 10, 2005, at www.cs.vu.nl/~guus/papers/Wielinga01a.pdf

Zeng, M. L. (1999). Thesaurus management system. (software). Accessed June 10, 2005 at http://circe.slis.kent.edu/mzeng/thesaurihome.html.

Zeng, M. L. (2000, June 7). Taxonomy of knowledge organization sources/systems. Revised July 31, 2000. Accessed June 10, 2005, at http://nkos.slis.kent.edu/KOS_taxonomy.htm.

How to Order

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2005, American Society for Information Science and Technology