JASIST IndexJASIST Table of Contents

Journal of the Association for Information Science and Technology



In This Issue
Bert R. Boyce





An Algorithm for Term Conflation Based on Tree Structures
    Irene Diaz, Jorge Morato, and Juan Llorens
    Published online 28 December 2001

In this issue Diaz et alia describe Normalizer, a conflation or stemming algorithm which stores its prefixes and suffixes in tree structures to reduce required space and complexity. A comparison run with the Porter and Krovetz stemmer on five English text documents shows a reduction in error percentages. Thirty-six Spanish documents were also conflated and the results evaluated as to effectiveness of different rule sets in use. Regular verb transformations were most used and had the highest error rate. Irregular verb rules were the next most used followed by regular noun rules.








A New Model that Generates Lotka's Law
    John C. Huber
    Published online 28 December 2001

Huber assumes in his new bibliometric model that the author productivity rate follows the Generalized Pareto Distribution, that the author's career duration follows an exponential distribution, that productivity and career duration are related, and that publications are distributed across the career duration in a Poisson manner. By simulating authors' contributions over time he is able to use the model to generate Lotka's law and good fits with empirical distributions.








Collaborative Relevance Judgment: A Group Consensus Method for Evaluating User Search Performance
    Xiangmin Zhang
    Published online 28 December 2001

Zhang believes that since research is often a collaborative process the evaluation of retrieved search results in such cases should not be the subjective evaluation of the single searcher, but rather that relevance judgement should be a collaborative evaluation. This is currently practiced in TREC's use of pooled assessments. He suggests that a group can be formed by any set of people with common interests represented by searched questions. This group then pools all retrieved documents, or ideally all retrieved documents judged relevant by each searcher. This may mean simply the best retrieved set, not those in that set deemed relevant. Documents are weighted by the number of users who retrieved them, and the relevant set considered to be those above some chosen threshold. A user's relevance score is then the sum of the weights of the documents retrieved divided by the number of items in the consensus set minus the number of documents the user retrieved, plus the number of not relevant documents the user retrieved.

Zhang then conducts an experiment with 56 student volunteers in which educational level, academic background, computer experience, and native language were tested for their effect on search performance as measured by the collaborative relevance score on four questions each participant searched. The collaborative relevance scores appear to vary with standard recall and precision measures. Educational level had a significant effect, but native English speaking did not. Science/engineering students had higher scores than those in the social sciences and humanities. Differences in computer experience were not statistically significant.
















Will This Paper Ever Be Cited?
    Quentin L. Burrell
    Published online 28 December 2001

For a homogenous set of papers given the average rate at which a paper attracts citations, Burrell calculates the probability that a paper will ever be cited assuming it has not been cited in a given time. The longer the elapsed time without citation the greater the likelihood it will never be cited.






A Context Vector Model for Information Retrieval
    Holger Billhardt, Daniel Borrajo, and Victor Maojo
    Published online 28 December 2001

Billhardt, Borrajo, and Maojo create a matrix of term context vectors whose values are the normalized co-occurrence frequencies of term pairs in a document across the whole collection. The normal Vector Space Model document vectors are then transformed into document context vectors by summing the product of each term's weight and its context vector divided by the length of that vector so that the value is the average of the influences of the term on all terms in the document. Queries are then handled in the normal manner using either term frequency vectors, binary vectors, or query context vectors obtained as the document vectors above.

In tests run on the MED, CRANFIELD, CISI, and CACM collections the terms were first run against a stop list then stemmed and single occurrence stems eliminated. Comparisons were made to the Vector Space Model using IDF weights. In general small improvements are noted in all collections with differing variants of the context approach having the better effect in different collections. The procedure is expensive in terms of time and memory, particularly if query context vectors are computed while a response is awaited.













The Map Library in the New Millennium, edited by R.B. Parry and C.R. Perkins
    Lisa A. Ennis
    Published online 27 December 2001




Data Privacy in the Information Age
    Alan T. Schroeder, Jr.
    Published online 14 December 2001






ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2001, Association for Information Science and Technology