JASIST IndexJASIST Table of Contents

Journal of the Association for Information Science and Technology



In This Issue
Bert R. Boyce





User Perspectives on Relevance Criteria: A Comparison among Relevant, Partially Relevant, and Not-Relevant Judgments
Kelly L. Maglaughlin and Diane H. Sonnenwald
    Published online 28 January 2002

In this issue Maglaughin and Sonnenwald provided 12 graduate students with searches related to the student's work and asked them to judge the twenty most recent retrieved representations by highlighting passages thought to contribute to relevance, marking out passages detracting from relevance, and providing a relevant, partially relevant or relevant judgement on each. By recorded interview they were asked about how these decisions were made and to describe the three classes of judgement.

The union of criteria identified in past studies did not seem to fully capture the information supplied so a new set was produced and coding agreement found to be adequate. Twenty-nine criteria were identified and grouped into six categories based upon the focus of the criterion. Multiple criteria are used for most judgements, and most criteria may have either a positive or negative effect. Content was the most frequently mentioned criterion.











Strategic Help in User Interfaces for Information Retrieval
Giorgio Brajnik, Stefano Mizzaro, Carlo Tasso, and Fabio Venuti
    Published online 8 February 2002

Brajnik et alia describe their view of an effective retrieval interface, one which coaches the searcher using stored knowledge not only of database structure, but of strategic situations which are likely to occur, such as repeating failed tactics in a low return search, or failing to try relevance feedback techniques. The emphasis is on the system suggesting search strategy improvements by relating them to an analysis of work entered so far and selecting and ranking those found relevant. FIRE is an interface utilizing these techniques. It allows the user to assign documents to useful, topical and trash folders, maintains thesauri files automatically searchable on query terms, and it builds, using user entries and a rule system, a picture of the retrieval situation from which it generates suggestions.

Six participants used FIRE in INSPEC20K database searches, two for their own information needs and four needs provided by the authors. Satisfaction was measured in a structured post search interview, behavior by log analysis, and performance by recall and precision in the canned searches. Participants found the suggestions helpful, but insisted they would have taken those approaches without such assistance. Users took the suggestions offered and preferred those demanding the least effort.













Getting Answers to Natural Language Questions on the Web
Dragomir R. Radev, Kelsey Libner, and Weiguo Fan
    Published online 31 January 2002

Seven hundred natural language questions from TREC-8 and TREC-9 were sent by Radev, Libner, and Fan to each of nine web search engines. The top 40 sites returned by each system were stored for evaluation of their productivity of correct answers. Each question per engine was scored as the sum of the reciprocal ranks of identified correct answers. The large number of zero scores gave a positive skew violating the normality assumption for ANOVA, so values were transformed to zero for no hit and one for one or more hits. The non-zero values were then square-root transformed to remove the remaining positive skew. Interactions were observed between search engine and answer type (name, place, date, et cetera), search engine and number of proper nouns in the query, search engine and the need for time limitation, and search engine and total query words. All effects were significant. Shortest queries had the highest mean scores. One or more proper nouns present provides a significant advantage. Non-time dependent queries have an advantage. Place, name, person, and text description had mean scores between .85 and .9 with date at .81 and number at .59. There were significant differences in score by search engine. Search engines found at least one correct answer in between 87.7 and 75.45 of the cases. Google and Northern Light were just short of a 90% hit rate. No evidence indicated that a particular engine was better at answering any particular sort of question.














Using Statistical and Contextual Information to Identify Two- and_  Three-Character Words in Chinese Text
Christopher S.G. Khoo, Yubin Dai, and Teck Ee Loh
    Published online 8 February 2002

Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.















 Combining and Selecting Characteristics of Information Use
Ian Ruthven, Mounia Lalmas, and Keith van Rijsbergen
    Published online 4 February 2002

Ruthven, Lalmas, and van Rijsbergen use traditional term importance measures like inverse document frequency, noise, based upon in-document frequency, and term frequency supplemented by theme value which is calculated from differences of expected positions of words in a text from their actual positions, on the assumption that even distribution indicates term association with a main topic, and context, which is based on a query term's distance from the nearest other query term relative to the average expected distribution of all query terms in the document. They then define document characteristics like specificity, the sum of all idf values in a document over the total terms in the document, or document complexity, measured by the documents average idf value; and information to noise ratio, info-noise, tokens after stopping and stemming over tokens before these processes, measuring the ratio of useful and non-useful information in a document. Retrieval tests are then carried out using each characteristic, combinations of the characteristics, and relevance feedback to determine the correct combination of characteristics. A file ranks independently of query terms by both specificity and info-noise, but if presence of a query term is required unique rankings are generated.

Tested on five standard collections the traditional characteristics out preformed the new characteristics, which did, however, out preform random retrieval. All possible combinations of characteristics were also tested both with and without a set of scaling weights applied. All characteristics can benefit by combination with another characteristic or set of characteristics and performance as a single characteristic is a good indicator of performance in combination. Larger combinations tended to be more effective than smaller ones and weighting increased precision measures of middle ranking combinations but decreased the ranking of poorer combinations. The best combinations vary for each collection, and in some collections with the addition of weighting.

Finally, with all documents ranked by the all characteristics combination, they take the top 30 documents and calculate the characteristic scores for each term in both the relevant and the non-relevant sets. Then taking for each query term the characteristics whose average was higher for relevant than non-relevant documents the documents are re-ranked. The relevance feedback method of selecting characteristics can select a good set of characteristics for query terms.




















A View to the Future of the Library and Information Science Profession: A Delphi Study
Shifra Baruchson-Arbib and Jenny Bronstein
    Published online 6 February 2002

Baruchson-Arbib and Bronstein present the results of a Delphi study held in Israel from 1998 to 2000. One hundred and twenty directors of large public and academic libraries, heads of LIS departments, and of corporate information centers in the USA, Canada, Europe, and Israel participated, using a 47 statement website as a base. Consensus on most points was reached in the first round. A second round included only 26 participants whose responses fell outside the group consensus. Seventy seven percent believe the traditional model of the library will not be replaced in their lifetimes. A user centered approach is highly favored, as is more assertive behavior including marketing and promotion. Less than 8% believe the profession will disappear.











Mathematical Foundations of Information Retrieval, by Sandor Dominch
Leo Egghe
    Published online 31 January 2002




Automatic Summarization, by Inderjeet Mani
Shirley J. Lincicum
    Published online 23 January 2002



ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2001, Association for Information Science and Technology