of the American Society for Information Science and Technology Vol. 27, No. 4 April / May 2001 |
|
|
|
|
|
ASIST SIG/CR Classification Workshop 2000: Classification for User Support and Learning by Dagobert Soergel
Editor's Note: This report from the ASIST 11th
Classification Research Workshop, presented by ASIST SIG/CR, was prepared by Dagobert Soergel, with contributions from the session rapporteurs Edie Rasmussen, Corinne Jörgensen, Linda Rudell-Betts, Jian Quin and Barbara Kwasnik.
The 11th Classification Research Workshop of the ASIST Special Interest Group in Classification Research (SIG/CR) was held on Sunday, November 12, 2000, as part of the 62nd ASIST Annual Meeting. The ASIST SIG/CR 2000
co-chairs were Dagobert Soergel, Padmini Srinivasan and Barbara Kwasnik. A highly competitive selection process brought together papers under the theme Classification for User Support and Learning
. The program is given in Figure 1.
Some of the papers are available on the workshop website at http://uma.info-science.uiowa.edu/sigcr/Final versions of the papers will be published mid-2001 by Information Today as Advances in Classification Research, v. 11. The first part of this report gives short synopses of the papers; the second part lists themes and research questions that emerged. Introduction and Foundation The leadoff speaker was David Jonassen, Distinguished Professor, School of Information Science and Learning Technologies, University of Missouri. He provided a perspective underlying the workshop in his talk, "Knowledge is complex: accommodating human ways of knowing." The paper's main message: we need classifications for different kinds of knowledge that users hold and seek, particularly types of knowledge that are intimately tied to doing. The types of knowledge he outlined are shown in Table 1.
Session 1: Developing user-oriented classifications Following the leadoff presentation, Session 1 covered a wide range of methodological tools for constructing thesauri/classifications/ontologies. There were two papers. In her paper, "Domain analysis, an important part of thesaurus construction: methodologies and approaches," Marianne Lykke Nielsen introduced and illustrated domain analysis. Domain analysis is a multi-pronged method to discover users' task approaches, resulting information needs, conceptual frameworks, and terminology as the basis for constructing a truly user-oriented thesaurus, exemplified by a thesaurus for a pharmaceutical company. Domain analysis focuses on the following factors:
It uses the following methods:
In the second paper, "Terminology development and organization in multi-community environments: the case of statistical information," Stephanie Haas and Carol Hert presented a conceptual framework and methodology for discovering concepts, concept relationships and terminologies used by different user communities concerned with the same subject matter. Their example concerned statistical data. The method consists of three parts:
The experts who create the website can use the results to construct a crosswalk from the search terminology of "lay" users of a statistics website to the terminology. Session 2: Classification in the User Interface (Reported by Corinne Jörgensen) Session 2 dealt with using classification for enhanced searching and display. It brought to the fore issues of user interaction and navigation with graphical and text-based information and the role that a thesaurus or structured display can play in these areas. Nina Wacholder et al in "Accessing and browsing 3D anatomical images with a navigational ontology," presented the Vesalius Anatomy Browser : http://cpmcnet.columbia.edu/vesalius/.The Browser is an elegant system for searching and displaying anatomical images that is based on an ontology of body systems and body parts using several types of relationships. The presentation elucidated the problem of learning how to make massive amounts of data in a visual display useful and comprehensible. The solution taken by the project team was to add explicit conceptual information to the system; otherwise the displayed information is only meaningful to an expert, in this case an anatomist. Their "navigational ontology" supports restricted inference and restricted relationships. In this case, the anatomically significant relationships are conceptual, functional and spatial. The two major types of relationships in the system are part-of (component-structure) and is-a (taxonomic). Is-a and part-of, however, are not simple relationships; in a visual environment their complexity becomes more obvious, as there are matters of granularity and scale and multiplicity of types. For example, in addition to component structure there are other kinds of part-whole relationships, such as region and marker relationships; some things only makes sense as part of a larger structure. One important point that holds implications for thesaurus display is that not all combinations of structures have names; therefore the system is designed to show relationships among structures, enabling the user to choose non-named sets or groups. One question is whether, in a visual navigational ontology, there are other relationships besides is-a and component structure that need to be added. Wacholder stated that currently more spatial relationships are needed, such as "nearby" or "part of two systems," enabling the user to search on more combinations. Adding 3-D creates another set of relationships, and the major issue becomes one of classification, not interface design. What other non-visual relationships may need to be added in the domain of anatomy? And when does one start adding other relationships outside the scope of anatomy but of interest in a wider medical research domain, such as similar biochemical processes? One can see far in the future a system encompassing many types of knowledge about the functioning of the human body and capable of displaying not only non-named "things" but also facilitating new discoveries by relating previously unrelated structures, processes and outcomes. Susan Dumais et al, in "Use of classified displays of Web search results," presented empirical evidence that classified displays of Web search results are indeed useful. They perform better than simple ranked-list displays both for user tasks and in user preference. Figure 2 is an example of a category display that has been abridged and simplified. All titles are hyperlinks.
Here, automatic techniques map search results to a pre-established scheme of categories. The advantages of this approach are that a user can quickly know the structure of the information and that users easily understand this type of display. In contrast, clustering techniques are used primarily to discover structure. In a retrieval interface clustering is slow, and it is hard for the user to interpret the resulting unlabeled groups. The user study reported here confirmed the advantages of a category display over a list display, both in terms of search times and user satisfaction. Interestingly, the researchers found that users could tolerate some ambiguity and "fuzziness" in the display. Items could be in multiple places. In the study they could be placed in up to 13 categories. Subjects noticed this cross-listing and liked it. The automatic classification is not perfect, and users noticed errors. However, the errors did not bother them. These results raise the question of how "perfect" a classification process should be. While a large amount of error will cause users to distrust a system, greater accuracy requires larger amounts of time and, thus, money. To what extent should we strive for "perfection" in classification of heterogeneous documents in very large databases such as the Web? Since the standard for a worthwhile improvement is generally taken to be a 10% increase in precision, techniques that create only small incremental improvements may not be worth the time and money invested in their creation. Related to this argument is the idea that while in creating a classification system tailored to the needs of a particular user community we reinforce domain boundaries, classification of large heterogeneous collections would seem to need some permeability across these boundaries. As the work of Dumais, Cutrell and Chen shows, improvement in display of search results can minimize the impacts of "imperfect" categorization. In "SERUBA - A new search and learning technology for the Internet and intranets," Winfried Schmitz-Esser gave a preview of a Web search system that uses a thesaurus with a rich set of relationship types to help the user explore her search topic. The relationships used are Abstract/generic When the user enters a search term, the system uses synonym relationships to identify the corresponding concept and then displays other concepts in an array arranged by type of relationship. An abridged display is shown in Figure 3, where each referenced concept is a hyperlink.
The system displays results using its Basic Semantic Reference Structure, a frame whose slots can be seen in Table 2.
Session 3: Automatic Creation of Representation Session 3 dealt with automated methods to create the knowledge structures necessary for good user support. Susanne Humphrey et al described in "Automatic indexing by discipline and high-level categories: Methodology and potential applications" a system developed for automatically indexing documents with broad descriptors that express the general nature and orientation of the document and thus are useful complements to specific descriptors. Two types of broad descriptors are assigned:
Rules for assigning journal descriptors were developed based on statistical association of document features such as title words with journal descriptors assigned to documents in a training set. The rules for assigning semantic types rely on a more complex indirect method. In the second paper of the session, "Classification of research papers using citation links and citation types: Toward automatic review article generation," Hidetsugu Nanba et al presented a tool box for the automated or computer-assisted generation of reviews based on analyzing citation relationships. The three tools would each be useful individually. They are as follows:
The citation area tool starts from the sentence containing the citation and adds sentences preceding or following based on the occurrence of cue words that indicate text cohesion. The citation type tool is also based on cue words and assigns a citation to one of three types: (1) shows other researchers theories and methods; (2) points out problems or gaps in related works; and (3) other. The paper discusses both word-based and citation-based approaches to automatic classification. The Idea Mart In the middle of the day an "idea mart" was held. It was devoted to extensive discussion of emergent research ideas or projects in small groups in five parallel sessions covering two topics each. For a list of presenters and topics, see Figure 1. This experiment turned out very well, producing many useful suggestions for the researchers who presented at the sessions. As the final section to this paper we report the themes that emerged from the papers and discussions. Some themes are clearly tied to one paper while others emerged in several papers. We list the 8 themes followed by brief summaries and/or outlines of the points developed. Theme 1: Expanded use of classifications Theme 1: Expanded use of classifications Several presentations call into question the restricted uses that classification schemes have played, being used primarily for organization of information for retrieval. Other roles that need to be explored more fully include roles in learning, such as the use of the visual anatomist for training and education, exploration and browsing, creativity, discourse, problem solving, and information. Question: How can we build classification systems that would enable us to discover and see relationships that have not yet been established? Theme 2: Requirements for diversity in classification Classifications should serve a given purpose for a given user community. Language – terms and their relationships – is complex; it shows differences not only across domains but also across user groups in the same domain. This introduces many sources of diversity in the design of classifications:
Implications
Role of classification in bridging diversity Classification should honor diversity by reflecting different perspectives, etc. But classification should also bridge diversity by mediating between different points of view, different knowledge and cultural systems. For example, a classification of concepts in "alternative" medicine could include scope notes and relationships that relate its concepts to concepts in "standard" medicine. By elaborating concepts, concept relationships and conceptual structure in different realms, classification can help identify commonalities and differences and the nature of differences, supporting an effort at sharing and mutual refinement of conceptual structures. Theme 3: The quest for unity – multipurpose classifications, reuse Classifications require considerable intellectual investment, so one would like to reuse them. Tension with diversity! Some thoughts from the discussion: Can a thesaurus be reorganized for multiple purposes? Classification modules that can be used in different schemes: How do we build modular ontologies to better represent dynamic domains? These would be ontologies that could flexibly extend the working ontology, for example extending the ontology of basic business processes by adding a module about auctions. How can we build classification schemes that store basic-level (mid-level) attributes that are neither too abstract nor overly specified so that they can be used effectively by people in a variety of contexts, when we know neither who the people are nor what the contexts are? The mapping of ontologies one to another must include more than just terms and their relationships, but must also include information about the context/situation. Is it possible to reorganize an existing thesaurus into a "navigational ontology" to support searching and browsing? Or does such a tool have to be created initially with these goals in mind (re question one)? Can one thesaurus be reorganized in different ways to serve multiple purposes, such as searching, navigation, instruction, "stimulation" (creativity)? Theme 4: Types of knowledge covered in classifications
Implications Importance of stepping back from what we "know" about building an ontology based on domain knowledge. Theme 5: Orientation of classification Should classification reflect
How can a classification be constructed that mediates between these two orientations? Theme 6: Types of relationships in a thesaurus / classification / ontology Traditional thesauri use just BT/NT and RT (broader term/narrower term and related term) as conceptual relationships. However,
Theme 7: Display and user interaction issues The following points were noted:
Theme 8: Practical issues Classified displays are useful but
What can be automated? (Session 3) Dagobert Soergel, professor in the College of Library and Information Services, can be reached at 4105 Hornbake Library, University of Maryland - College Park, College Park, MD 20742-4345; 301/405-2037; e-mail: ds52@umail.umd.edu |
||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Copyright © 2001, American Society for Information Science and Technology |