Journal of the Association for Information Science



Bert R. Boyce




 Images of Similarity: A Visual Exploration of Optimal Similarity Metrics and Scaling Properties of TREC Topic-Document Sets
Mark Rorvig

In our first two papers, Rorvig takes a visual look at the TREC data.In ``Images of Similarity,'' five different similarity measures used on five TREC document sets are scaled and plotted using multidimensional scaling with ordinal, interval, and maximum likelihood assumptions. Cosine, and surprisingly overlap, provide the desired bull's-eye pattern under maximum likelihood assumptions and tighten as assumptions move from ordinal to maximum likelihood. Ordinal assumptions with MDS are not adequate for a visual information retrieval interface. A regularity in the pattern of relevant documents would seem to indicate a consistency in human relevance assignments not indicated in previous work.




A Visual Exploration of the Orderliness of TREC Relevance Judgments
Mark Rorvig

In the second paper, multidimensional scaling of topical sets from the TREC database indicates that Shaw's criticism of clustering techniques does not extend to similarity data transformed to spatial proximities since the isomorphic relations between topic distances do appear. Only two of 200 randomly introduced documents are found in the center of the dense area of relevant documents, suggesting that while the TREC evaluation methods do exclude relevant documents, the problem may not be as severe as Harter has proposed. The semantic relevance of others from the 200 found close to the dense area is unclear and will require investigation.




Automatic Indexing of Documents from Journal Descriptors:  A Preliminary Investigation
Susanne M. Humphrey

Humphrey outlines a technique for associating the journal descriptors (JDs) in NLM's serials authority file SERLINE with words commonly occurring in the titles and abstracts of papers found in journals that have been assigned these descriptors. A Medline training set will produce a table of text words associated with particular journals, and the descriptors assigned. The test process involved text indexing of titles and abstracts of 3,995 training set documents covering 1,466 journals to extract terms occurring 13 or more times in the set. A measure can be based upon the number of occurrences of a term in association with particular journal descriptor divided by its total occurrences, or on the number of papers containing the term for each descriptor divided by the total number of papers containing the word in the training set. This produces a ranked list of descriptors for each word extracted from a paper. The average rankings over all a document's terms are then used for the document's JD ranking. Tests of papers outside the training set return the JDs of these papers' journals and other JDs as well. The inverse of the citation count for a JD is shown to be a likely normalization factor for JDs with high citation counts.




Bibliometric Overview of Library and Information Science Research in Spain
V. Cano

The 345 papers that constitute the 17-year output of two leading Spanish library and information science journals were analyzed by Cano to collect author affiliation, number of authors per paper, country of the first author, number of authors publishing in both journals, and number of authors publishing in other journals indexed by LISA. Sixty-eight percent of the papers have single authors, and only seven authors published in both journals. Empirical and descriptive methods dominate.




User Reactions as Access Mechanism: An Exploration Based on Captions for Images
Brian C. O'Connor, Mary K. O'Connor, and June M. Abbas

O'Connor et al. believe verbal user reactions to images may be collected and used to represent the knowledge state of the reactor to the image, with the assumption that future users may wish an image that evokes a similar state. A world wide web site of 300 images was exposed to 120 respondents asked to provide responses of words to describe the image, words to describe the feelings evoked by the image, and to write a caption. The 82 images receiving 10 or more responses were then characterized by counts of total responses, adjectives describing the image as a whole, narrative responses, captions with narrative responses as well as the percentage of responses with adjectival descriptors, the percentage of narrative responses, and the percentage of captions with narrative responses. There is a tendency for different respondents to assign diametrically opposed adjectives.




Medical Students' Confidence Judgments Using a Factual Database and Personal Memory: A Comparison
Karen M. O'Keefe, Barbara M. Wildemuth, and Charles P. Friedman

Measuring need fulfillment by their subject's confidence in the accuracy of their answers, O'Keefe et al. examine medical students' ability to recognize the meeting of an information need from memory and from using a factual database. Twelve of 43 students, randomly selected and tested three times, completed a sufficient amount of questions with confidence rankings to be analyzed. Two passes were recorded each time: first, short answers from memory; then with the aid of a database search. An ANOVA shows no significant effect on Brier scores, the sum of the square of the differences between the confidence rating and the score for each question divided by the number of questions, on the basis of memory versus database support. Confidence, the difference between the average of the confidence probabilities for a set of questions and the proportion answered correctly, increased with experience over the three repetitions.




Employing Multiple Representations for Chinese Information Retrieval
K. L. Kwok

Kwok finds that difficult Chinese word segmentation can be avoided if bigrams (instances of two consecutive characters) are extracted and used, despite the fact that this method leads to an index space three times as large as word extraction. Bigrams extract the two-thirds of Chinese words which are two characters in length, but while meaningless combinations of very high and low occurrence may be removed, many meaningless bigrams will remain. Single character words, which make up about 9% of the language, would also not be represented. 

Using a dictionary of 2,175 common one-, two-, and three-character words, strings are processed left to right, with useful terms retained when found. The remaining strings are segmented using rules. A probabilistic feedback model is then used to generate RSVs indicating matches between queries and documents, or document segments. Using the TREC 5 and 6 collections, Kwok finds that mixing single character with bigram or with short-word indexing improves average precision in four of five cases. Short-word and character is most efficient and gives the best results. Combining the results of short-word and character with bigram and character yields an additional 5% improvement at substantial overhead cost. 




Deep Information: The Role of Information Policy in Environmental Sustainability
by John Felleman
Reviewed by: Mike Steckel




Electronic Databases and Publishing
edited by Albert Henderson
Reviewed by: Marianne Afifi




Localist Connectionist Approaches to Human Cognition
edited by Jonathan Grainger and Arthur M. Jacobs
Reviewed by: Chaomei Chen




Ethics, Information and Technology: Readings
edited by Richard N. Stichler and Robert Hauptman
Reviewed by: Thomas A. Peters




Indexing and Abstracting in Theory and Practice
by F. W. Lancaster
Reviewed by: Jens-Erik Mai




Remediation: Understanding New Media
by Jay David Bolter and Richard Grusin
Reviewed by: Ronald Day





asisnavbarASIS HomeSearch ASISMake A Comment

1999 , Association for Information Science
Last update: April 21, 1999