JASIS HomeJASIS TOC's

Journal of the Association for Information Science

IN THIS ISSUE

 

Bert R. Boyce

735

RESEARCH

 

What Is Information Discovery About?
H. A. Proper and P. D. Bruza

Proper and Bruza's believe that information discovery is an attempt to broadly model information retrieval outside the context of any operational retrieval model. They define user need in terms of supply and demand for something undefined called infions, or information particles. Relevance is meeting a set of requirements stated in the same terms. Aboutness is apparently a relation on a set of key words, but satisfaction is a relationship on information carriers and their logical descriptions. A distinction is made between system and user satisfaction, which will be close in factual descriptions but not necessarily so in cases of aboutness.

 

737

 

Text Segmentation for Chinese Spell Checking
Kin Hong Lee, Qin Lu, and Mau Kit Michael Ng

Since Chinese text has no natural delimiters text must be segmented into valid words before error correction can take place. Many words are represented by single characters but others require multiple character strings. Lee, Lu, and Ng. test the Block of Combinations (BOC) segmentation method which uses a 60,000-word dictionary with the 2,000 most frequently used words grammatically tagged. A user dictionary for adding words not predefined is available, and otherwise unidentified words are stored in a temporary file. The 200 most frequent single character words are accepted but others are suspected to be errors and presented to the user for clarification with similar words as suggestions. Since the number of possible segmentations increases rapidly as the number of characters grows, a sliding five-word window is used rather than a complete sentence. The procedure is more accurate than another method which takes about the same computational time.

 

751

 

A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent
Maria J. Martin-Bautista, Maria-Amparo Vila, and Henrik Legind Larsen

The Genetic Information Retrieval Agent Filter (GIRAF) is a software agent, tested here by Martin-Bautista, Vila, and Larsen, that can work offline to filter and rank retrieved documents from an Internet search engine. Query terms are extracted from evaluated initial search documents or from ideal documents provided by a user, and each of these terms is given a weight that is the average of its occurrence frequency in all documents analyzed. One of four gene types is assigned randomly to each term to form a triple with its weight, and chromosomes (strings of these gene triples) are randomly formed. Type one genes use as their weight a number of occurrences of the term in a document that will give complete satisfaction (a higher number reducing satisfaction). Type two genes are satisfied completely by documents that have no occurrences of the term. Type three genes use the weight as a traditional threshold with complete satisfaction achieved if the number is reached or exceeded. Type four divides each document into three parts and will be satisfied by any of the parts. Chromosomes are ranked by their similarity to relevant documents and modified by choosing parents where the first is random and the second higher than the first and preforming a gene crossover. Mutation also randomly occurs. New chromosomes cause those at the bottom of the list to be removed. Tests with virtual users indicate that type three genes work best, except that when user profiles are permitted to change during the process, type one gains some advantage.

 

760

 

A Distance and Angle Similarity Measure Method
 Jin Zhang and Robert R. Korfhage

The typical angularity measure (cosine) identifies documents whose index term distributions are similar. Despite this similarity, they may be far apart in the document space if the level of detail in the discussion of the topics is different. Document similarity with a distance measure depends upon the length of the hypersphere radius from the reference point to the document. Zhang and Korfhage present a similarity measure which combines distance and angularity based measures. This measure "s" is the product of two parameters, a and c, where "a" has the negative radius as an exponent (a typical distance measure) and "c" (between 0 and 1) has the value of the angle divided by the maximum value of the angle as an exponent. The maximum value of  "s" is equal to the distance-based measure and the minimum is smaller but in the same position as the cosine measure. Varying "a" and "c" will reflect a user's emphasis on distance or angularity.

 

772

 

DARE: Distance and Angle Retrieval Environment: A Tale of the Two Measures
Jin Zhang and Robert R. Korfhage

In a second paper, Zhang and Korfhage present a visualization model which can display both distance and angle measures simultaneously and handle both conjunction and intersection. Changing the slope and position of a straight line in the visualization space results in a modification of the threshold-defined contour in the document vector space and thus expands or contracts the scope and emphasis of the retrieved set.

779

PERSPECTIVES ISSUE ON . . . VISUAL INFORMATION RETRIEVAL INTERFACES
 

 

Introduction and Overview: Visualization, Retrieval, and Knowledge
 Mark Rorvig and Lois F. Lunin

This Perspectives issue is assembled to provide an historical background to visualization in information retrieval. It is a review of the assumptions and technology configurations by which the current literature may be interpreted. The techniques of the authors of this issue differ, but all treat their techniques as manuals of description flowing from a history of common mathematical and technical influences.  All technologies have histories of development. The historical forces of visualization frame the current efforts and comprise the field in which new problem dimensions are addressed. No field of scientific inquiry emerges without a background. This issue adds to the depth necessary for the study of visualization by new students and new scholars.

 

790

 

The NASA Image Collection Visual Thesaurus
M. E. Rorvig, C. H. Turner, and J. Moncada

The first visual interface to a collection was designed and implemented at the Johnson Space Center of NASA in the years 1988-1992.  In this interface, described in the article entitled, "The NASA Visual Thesaurus," the Rorvig et al assumed that the task of inferring images from terms and terms from images would introduce invariance in image indexing.  The system remained in use for two years, but eventually failed because no automatic method to assign terms to images could be discovered, and the manual cost of such term assignment was too great to be supported. 

Rorvig et al attempted to use image descriptions clustered by cosine vector methods to identify a unique image for every thesaurus term. The candidate images suggested by this method were often heartbreakingly close to the mark. But close was not good enough. These developments were described in detail by Seloff (1990). Although the Seloff article has been widely cited, the initial article which specified the design parameters for the system of his report has remained unpublished.  It appears in this article in the form originally presented at the ASIS mid-year conference of 1988.  The article is significant because it represents the first identification of the components of a visual interface.

 

794

 

Visualizing Science by Citation Mapping
Henry Small

The article by Henry Small of the Institute for Scientific Information addresses the two decade long historical use of visualization techniques in calculating the relationships among scientific fields by their patterns of co-citation. Small begins with the simplest of algorithms as conceived within the computational limitations of the 1970's and ends with the most ambitious ones presently available through Sandia National Laboratories (SNL). In this article, students and scholars will find algorithms applicable to many different aspects of the co-citation problem, as Small frankly describes the research paths that were successful and led to further enhancements as well as the ones which were eventually discarded either because of their inefficiency in computation, or their failure to yield truthful insights validated by earlier techniques.  Many of these algorithms may be transplanted to address similar problems with data that may be encountered by researchers who require some intermediate processing alternatives.

 

799

 

The Ecological Approach to Text Visualization
 James A. Wise

The article by Jim Wise of Integral Visuals Corporation details the technical advances of researchers at the Pacific Northwest National Laboratories (PNNL) over an intense five year history of development. Wise's "The Ecological Approach to Text Visualization" offers a rich archive of techniques. This descriptive tour de force  begins with the most brutal techniques (e.g., vectors of length 200K analyzed through Multidimensional Scaling (MDS)) and, in completely clear and intelligible detail describes the short cut methods developed for computational efficiency. These efforts have resulted in the presently available commercial product "ThemeMedia" offered through a subsidiary of the Smaby Group of investors (which purchased rights to further develop the technology). Among the highlights of this article is a description of the discovery that single and multiple link cluster centroids can be used to approximate the full text collections originally required for visual display.  Additionally, the transformations of the 2d dot displays to the present terrain models simultaneously developed at PNNL and SNL are described in sufficient detail for engineers to reproduce the same progression of results.

 

814

 

A Collection of Visual Thesauri for Browsing Large Collections of  Geographic Images
Marshall C. Ramsey, Hsinchun Chen, Bin Zhu, and Bruce R. Schatz

In Ramsay et al, earth observing images are  parsed as texts are parsed. These authors use Gabor filters to combine like terrains. No clearer description of this process is available in the present literature. A Gabor filter yields textures. By segmenting images into component texture boundaries, search classes may be derived without resorting to textual description.  This technology thus succeeds where the NASA effort by Rorvig et al failed. The results reported in this article are concrete and verifiable; indeed, anyone who has ever traveled over Arizona highways can authenticate these data. The authors acknowledge the contributions of the Alexandria Digital Libraries Project, particularly the work of Manjunath and Ma  (1996), but claim their own extensions to this work as well.

 

836

 

Conference Notes--1996: Foundations of Advanced Information Visualization for Visual Information (Retrieval) Systems
 Mark Rorvig and Matthias Hemmje

One of the landmark developments in visual retrieval occurred at a workshop held in Zurich in the summer of 1996 in conjunction with the Association for Computing Machinery's Special Interest Group on Information Retrieval Annual Meeting. For the first time, both European and North American interests were represented in the development of criteria for evaluation of visual information retrieval. Among the Europeans, the newly formed FADIVA (Foundations of Advanced Information Visualization) group played the dominant role. The workshop report reproduced in this issue has been widely circulated, but never before published. This conference led to the first visualization of native TREC/Tipster data as a prelude to formal visual information retrieval evaluation strategies (Rorvig and Fitzpatrick, 1998; Rorvig, 1998).

[Robert Korfhage, one of the intellectual fathers of the visual information retrieval effort in both Europe and North America, has contributed his bibliography on this issue.  The bibliography is comprehensive for all work in this field c. 1997.  Such documents are of interest in determining the scope of future advances. In 1997, this was the known world view of this area of scholarly effort. For practical use in permitting users to copy this bibliography, it is available through the ASIS SIGVIZ website http://www.asis.org/SIG/SIGVIS/references.html where future editions may be conveniently updated.]

845

BOOK REVIEWS

 

Foundations of Library and Information Science
by Richard E. Rubin
Reviewed by: Boyd P. Holmes

 

848

 

Into the Future: The Foundation of Library and Information Services in the Post-Industrial Era
by Michael Harris, Stan A. Hannah, and Pamela C. Harris
Reviewed by: Ebrahim Afshar
 

849

 

Newspapers of Record in a Digital Age: From Hot Type to Hot Link
by Shannon E. Martin and Kathleen A. Hansen
Reviewed by: Amy E. Sanidas

850

ERRATUM

   

852

asisnavbarASIS HomeSearch ASISMake A Comment

1999 , Association for Information Science
Last update: June 07, 1999