Journal of the Association for Information Science



Bert R. Boyce




Informetric Distributions. III. Ambiguity and Randomness
A. Bookstein

Bookstein shows the Informetric laws can all be transformed into close approximations of Lotka's law. A model with a general stable component providing an expected value for the counts is supplemented by a generating function for a random component effectively supplied by the family of compound Poisson distributions. This introduction of ambiguities does not effect the regularity.




The Self-Sufficient Library Collection: A Test of Assumptions
F. C. A. Exon and Keith F. Punch

Exon and Punch repeated the 1981 Paustian analysis, which noted a weak but significant positive correlation between numbers of items borrowed on interlibrary loan and the collection size of borrowing academic libraries, finding a correlation of considerably greater strength. There is no evidence that building a collection will reduce interlibrary borrowing and thus lead to a self-sufficient collection.




A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System
Hsinchun Chen, Tobun D. Ng, Joanne Martinez, and Bruce R. Schatz

Chen and others look at linkages between two specialized vocabularies with a 30% term overlap. Terms were weighted using the product of term frequency and inverse document frequency and clustered using an asymmetric function which also penalized terms of very high document occurrence and set a maximum cluster size of 100. These vocabularies can be browsed by user designed queries or used in a spreading activation process with the Hopfield net algorithm. Terms suggested by biologists to link concepts in the two areas are in the conjoined thesaurus 60 to 85% of the time.




Automatic Classification of E-Mail Messages by Message Type
Andrew D. May

May provides an interesting contribution to the e-mail filtering problem by suggesting grouping by message type rather than topic. Over 1300 messages were grouped into question, response, announcement and administrative categories based on matching with selected pattern strings of text, and these groups compared with the investigator's manual groupings. Cohen's Kappa indicates agreement beyond chance for the 897 messages classified by the automatic method with those categorized manually. Unfortunately 34% of the messages could not be automatically classified.




Map Displays for Information Retrieval
Xia Lin

Lin shows that display structures of related documents provide a way of handling large retrieved sets. In a trial using index term generated vectors for documents in a conventionally retrieved set, Kohonen's algorithm modifies term weights to strengthen the links between close document vectors and finally partition the set. The areas formed correspond in size to the word frequencies and their closeness represents co-occurrence.




A Statistical Learning Approach to Automatic Indexing of Controlled Index Terms
Chi-Hong Leung and Wing-Kay Kan

Leung and Kan, using a ``positive training set,'' words from titles and abstracts of documents that have been indexed by a particular controlled term, and a ``negative training set,'' words from documents that have not been so indexed, compute a z-score for each word in each set. The difference in the z-scores in the two sets is used to weight the words in both cases. A vector of the words in the training sets with their difference weights is created for each controlled term and an indexing score for that controlled term and document, which is the sum of the weights times the frequency of the word in the document, divided by the number of words in the document, is generated. If this indexing score is greater than the sum of the average indexing scores in the positive and negative indexing set, the controlled term is assigned. In an extensive sample in INSPEC 88% of the positive evaluation set was properly indexed and 8% of the negative evaluation set was improperly indexed. In a similar MEDLINE sample the percentages were 88% and 6%. Reruns of the experiment with changing thresholds indicates that the sum of the average indexing scores in the two training sets is an optimal threshold.




Disciplinary Variation in Automatic Sublanguage Term Identification
Stephanie W. Haas

Using abstracts from eight different disciplines, target sets of terms representing the domain of each discipline were selected manually by Haas. Words that did not appear, or appeared with a domain marker, in a standard college dictionary were termed seed words. Words in a 1 to 9 word window about the seed words were extracted and categorized as stop words, domain terms that matched the targets, and general words. In all disciplines the percent of target terms extracted increases and the percent of domain terms of all the words extracted decreases as window size grows. The extraction process is more successful in the sciences, which have more domain terms occurring in sequences, than other disciplines.




Citation Theories in the Framework of International Flow of Information: New Evidence with Translation Analysis
Ziming Liu

Liu compares citation data from seven Chinese journals and translation data from several sources to determine that a strong relationship exists between the number of items translated into Chinese from a particular country and the number of citations of items from that country in Chinese journals. The correlation between translations into Chinese from a language and citations to items in that language is even stronger.




Backpropagation: Theory, Architectures, and Applications
edited by Yves Chauvin and David E. Rumelhart, Paul V. Biron




Scholarly Publishing: The Electronic Frontier
edited by Robin P. Peek and Gregory B. Newby
reviewed by: Jaap A. Jasperse





ASIS HomeSearch ASISMake A Comment

© 1998 , Association for Information Science
Last update: November 06, 1998