Journal of the Association for Information Science and Technology

Table of Contents

Volume 55  Issue 10





In This Issue
Bert R. Boyce










Automatic Generation of Domain Representations Using Thesaurus Structures
Juan Lloréns, Manuel Velasco, Antonio de Amescua, José A. Moreiro, and Vicente Martínez

Llorens et alia are interested in the software reuse problem and particularly in the organization of reusable routines for future retrieval. They begin with the manual selection of a document and software set that is representative of the domain to be organized and is homogeneous as measured by a large number of bibliometric values. Index terms are then extracted from text documents using inverse document frequency and n-gram techniques and from software using reverse engineering techniques on object oriented source code. Clusters of terms and hierarchies are then created. A Biology domain was created from 74 documents yielding 953 descriptors after manual removal of a small set of spurious descriptors and manual addition of about 30% of the set. Some clusters were considered effective, others not. The automatic selection of root terms was successful in only 25% of the cases despite the use of seven different classification methods. However, some successful hierarchies were constructed.








Fusion of Effective Retrieval Strategies in the Same Information Retrieval System
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David Grossman, Ophir Frieder, and Nazli Goharian

Beitzel et alia look at the effectiveness of fusing various effective methods of assigning similarity between a document and a query in a single retrieval system, while holding all other variables constant. They show that greater overlap of relevant documents than overlap of non-relevant documents does not lead to improved effectiveness with fusion methods and that any improvement from fusion techniques is likely to come from improved recall due to highly ranked relevant documents in the fused set. Using IIT, BM25, and Self-Relevance matching techniques with all query topics from the ad hoc tracks of TREC-6, 7, and 8, and from the ad hoc task of the web track of TREC-9 and 10, the component result sets were fused using CombMNZ, evaluated for effectiveness, and the overlap of the individual results analyzed. The fused retrieval never outperforms the best single method, and relevant overlap greater than non-relevant does not guarantee improvement.


Special Topic Section: Document Search Interface Design for
Large-Scale Collections







Document Search Interface Design: Background and Introduction to Special Issue
Javed Mostafa

Depending on the information need, the library user searching for high-quality and authoritative information may have to navigate among resources that are in different formats (bibliographic versus full-text), are stored in different media (text versus images), have different levels of coverage (news versus scholarly reports), or are published in different languages. Beyond the heterogeneity factor, the user faces specific challenges related to the search experience itself. These factors and their impact on searching can be best described using a four-phase framework, namely: formulation, action, presentation, and refinement (Shneiderman, Byrd, & Croft, [1998]).












EBizPort: Collecting and Analyzing Business Intelligence Information
Byron Marshall, Daniel McDonald, Hsinchun Chen, and Wingyan Chung

In this article, Marshall, McDonald, Chen, and Chung take a different approach to supporting search services to large and heterogeneous document collections. They propose development of a domain-specific collection by crawling the content of a small set of highly reputable sites, maintaining a local index of the content, and providing browsing and searching services on the specialized content. This resource, known as a vertical portal, has the potential of overcoming several problems associated with bias, update delay, reputation, and integration of scattered information. The article discusses the design of a vertical portal system's architecture called EbizPort, rationale behind its major components, and algorithms and techniques for building collections and search functions. Collection (or more broadly content) has an obvious relationship to the nature of the search interface, as it can impact the type of search functions that can be offered. Powerful search interface functions were built for EbizPort by exploiting the underlying content representation and a relatively narrow and well-defined domain focus. Particularly noteworthy are the innovative browsing functions, which include a summarizer, a categorizer, a visualizer, and a navigation side-bar. The article ends with a discussion of an evaluation study, which compared the EbizPort system with a baseline system called Brint. Results are presented on effectiveness and efficiency, usability and information quality, and quality of local collection and content retrieved from other sources (an extended search operation called meta-search service was also provided in the system). Overall, the authors find that EbizPort outperforms the baseline system, and it provides a viable way to support access to business information.









Topic Modeling for Mediated Access to Very Large Document Collections
Gheorghe Muresan and David J. Harper

The article by Muresan and Harper focuses on search mediation, which is most closely related to the formulation phase as described above. Muresan and Harper observe that users face difficulty in identifying terms that have high discriminatory power, and they usually enter common or ambiguous terms. When searches do not involve known-item clues (i.e., search by an author or title), expressing the search need becomes more difficult than usual because users have to transform a relatively vague need to appropriate topics and queries. The authors describe an interaction framework and implementation of a system called WebCluster that support search mediation. The essence of the framework is a two-stage search process. In the first stage, users are encouraged to explore representative document collections by browsing or searching, and identifying relevant documents. In the second stage, the system automatically generates queries based on evidence collected in the exploration stage and retrieves documents from target sources identified by the user. The article includes results from simulation experiments demonstrating that the mediation-based approach has the potential to improve retrieval effectiveness from large collections.












Multiple Viewpoints as an Approach to Digital Library Interfaces
James C. French, A. C. Chapin, and Worthy N. Martin.

French, Chapin, and Martin describe an approach called multiple viewpoint for the construction of search interfaces to large and complex collections. In querying a system, users often go through an iterative process whereby clues discovered in early stages of a search, for example, indexing vocabularies used by the system, are used to improve new queries submitted to the system. In typical situations, the user, however, would have to refine the query under the constraints of a single indexing vocabulary and query language. The authors convincingly argue that this is an excessively strict constraint. A multiple viewpoint interface would offer the user different windows into the same collection based on different vocabularies and access methods (query language), thus expanding the ways the user can reformulate queries and find relevant information. The proposed approach is relevant to both the formulation and refinement phases described above. The authors compare their idea with other related approaches and examine its impact on searching, interaction style, and interface design. They point out some of the key challenges to implementation of multiple viewpoints, namely query parallelism in different viewpoints, transparency of viewpoints, translating a query in one viewpoint to a query in a different viewpoint, and merging of results when the same query is applied in parallel against different viewpoints. These challenges receive careful attention from the authors and they describe concrete functions to address these challenges. The authors also present several examples of implemented systems covering diverse domains such as earth science, biomedicine, and nature imagery that employ the multiple viewpoint approach for improved searching.














Observing Users, Designing Clarity: A Case Study on the User-Centered Design of a Cross-Language Information Retrieval System
Daniela Petrelli, Micheline Beaulieu, Mark Sanderson, George Demetriou, Patrick Herring, and Preben Hansen

Interface function for searching content in one language by querying in a different language has become an important challenge. The report by Petrelli et al. discusses search interfaces for cross-language retrieval. This report is different from the previous two in that it covers a wider range of search steps users typically follow. This article also has a nice balance between theoretical (the proposed design and its rationale) and empirical aspects (methodology of design implementation and usability tests). Based on examination of past research, the authors developed a six-phase framework of interaction appropriate for cross-language retrieval. The individual steps can be roughly mapped to the four-phase framework presented earlier, and they are: system setup (choosing source and target language), query formulation, result overview, list and document examination, document comparison, and document accumulation. The authors developed interface mockups that support these six key steps of interaction. Taking a user-centric approach to design, the authors subsequently conducted a field study to establish user requirements for the system. Support for compound name searching, user-created dictionaries, sorting of results, and simultaneous searching of multiple text collections and languages were some of the primary requirements established. The authors describe how they refined the initial interface mockups based on the user requirements. In the final section of the article, results from a usability study conducted on a refined and implemented search interface are presented. The usability study was conducted using a system called Clarity, which permitted choices among three languages (English, Finnish, and Swedish) for source and target languages. The study was conducted in two locations: University of Sheffield, UK, and University of Tempere, Finland. The article discusses several findings from the usability study that are valuable for designers implementing cross-language search interfaces.


Letters to the Editor



Pearson's r and Author Cocitation Analysis: A Commentary on the Controversy
Stephen J. Bensman


Rejoinder: In Defense of Formal Methods
Per Ahlgren, Bo Jarneving, and Ronald Rousseau

ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2004, Association for Information Science and Technology