Journal of the Association for Information Science



Bert R. Boyce




Using Latent Semantic Indexing for Literature Based Discovery
Michael D. Gordon and Susan Dumais

Gordon and  Dumais applied Latent Semantic Indexing to documents with the term "Raynaud's". The top 40 words and phrases from the several statistics produced 136 unique terms or phrases. These were compared to the 136 nearest neighbors to the term "Raynaud's" in the LSI analysis.  The lists' top ten items are nearly permutations of one another, and a Spearman rank correlation of the top 40 two word phrases by both methods indicates that approximate position on one list predicts approximate position on the other. LSI appears to be a means of identifying "intermediate literatures" for Swanson's  process for identifying undiscovered public knowledge.




The Ambiguity of Negation in Natural Language Queries to Information Retrieval Systems
April R. McQuire and Caroline M. Eastman

Negation, as it is used in natural language expressions is often ambiguous and unclear as to scope. McQuire and Eastman report on 64 students who interpreted twenty queries containing negations with multiple nouns connected by ands, noun phrases, infinitive phrases, and prepositional phrases. The multiple nouns received a relatively consistent interpretation, but other forms resulted in multiple interpretations. An algorithm which parses a query uses negation across all terms when the case is simple, and presents a set of choices to the patron when ambiguity seems likely.




A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing
Hsinchun Chen, Ganesan Shankaranarayanan, Linlin She, and Anand Iyer

Standard relevance feedback procedures are considered by Chen et al. as a benchmark for testing three machine learning approaches to retrieval: ID3, which, using a training set of relevant and non-relevant documents, computes an entropy value for each attribute and then builds a tree where the attribute determining each node is chosen based upon its ability to reduce entropy.Genetic algorithms, which in this adaption considered documents whose fitness was determined by a similarity measure with the training set that determined the probability of selection of a new population with the addition of mutation operators. Simulated annealing, a modification of the GA  technique, useing a random mutation point applied to each document to generate a new candidate configuration, and accepting those where the similarity measure increased.

All three algorithms improve similarity scores, but not significantly better than does relevance feedback. When more complex queries with larger answer sets are considered alone the more sophisticated algorithms perform significantly better than relevance feedback. In a small search experiment, genetic algorithms outperformed other methods in both precision and simulated recall. From a review of terms chosen it appears that relevance feedback did not identify the most crucial concepts, ID3 over generalized, and that the other two methods struck a good balance between these extremes.




Minimal Level Cataloging: What Does It Mean for Maps in the Contexts of Card Catalogs, Online Catalogs, and Digital Libraries?
Zorana Ercegovac

Ercegovac provides a review of the LC cataloging crisis of the 1940's, and in particular of the simplified cataloging recommendation which led to what is now called minimum level cataloging (MLC). Little evidence of empirical studies of the value of MLC is to be found in the literature. However, the failure to include geospatial reference data, or exceed a single subject heading limits wide area availability.




Bibliometric Analysis of the Impact of Internet Use on Scholarly Productivity
Noam Kaminer and Yale M. Braunstein

Kaminer and Braunstein collect publication counts for the 122 faculty, Internet use counts from computer logs, and other variables from a questionnaire and published biographies.  Average yearly publication correlates with biological age, and years since Ph.D., but time assigned for research, number of grants as principle investigator, time to Ph.D., and Carnegie status of source of Ph.D. are not significantly related. A 10% increase in Internet logins results in an increase of .21 publications per year.




Orthography as a Fundamental Impediment to Online Information Retrieval
Terrence A. Brook

Written language is culturally determined, dynamic, and idiosyncratic, and variations in punctuation, spelling and characters must be dealt with directly by text based information retrieval systems. Using examples from the OCLC Online Union Catalog, DataStar and Dialog, Brooks illustrates the differences between the everyday orthography of the normal IR user and that required to use a standard commercial retrieval system.




Optimizing Similarity Using Multi-Query Relevance Feedback
Brian T. Bartell, Garrison W. Cottrell, and Richard K. Belew

A system by Bartell, Cottrell and Belew which adjusts its parameters to maximize the match between its ordering of a set of  retrieved documents and a target ordering locates a similarity measure that approaches optimum performance. Optimization is carried out by the function gradient method with the 1460 documents in the CISI collection and 51 training queries which produced a similarity measure unlike any of the classic measures. Using 25 new queries the measure located by optimization out performed the cosine, inner product, pseudo-cosine and number of terms similarity measures and is within one percent of estimated optimal performance.




Issues and Applications of Case-Based Reasoning in Design
Edited by Mary Lou Maher and Pearl Pu
Reviewed by Peter G. Underwood




Internet Besieged: Countering Cyberspace Scofflaws
Edited by Dorothy E. Denning and Peter J. Denning
Reviewed by Derek G. Smith




Digital Image Access & Retrieval
Edited by P. Bryan Heidorn and Beth Sandore
Reviewed by James M. Turner





ASIS HomeSearch ASISMake A Comment

© 1998 , Association for Information Science
Last update: November 06, 1998