Journal of the Association for Information Science and Technology

Table of Contents

Volume 55  Issue 5


In This Issue



In this issue
Bert Boyce














Trends and Issues in Establishing Interoperability Among Knowledge Organization Systems
Marcia Lei Zeng and Lois Mai Chan
Published online 16 December 2003

 In this issue Zeng and Chan review the current work in facilitating cross indexing system searching using three categories of the digital record organizational system environment; term lists, classifications, and relational vocabularies. They view these systems over what they consider two distinct eras of development; the online, and internet eras. In the online era 150 plus specialized indexing languages were in use and compatibility efforts concentrated in concept  mapping yielding either meta-thesauri, micro-thesauri, or macro-vocabularies to facilitate cross database search in a particular retrieval system. The growth of distributed data repositories in the internet era has led to an explosion of new metadata standards and increased interoperability efforts which are briefly reviewed and include derivation ( simple vocabularies developed from more complex), translation (transformation to a new natural language), leaf node linking (where specialized schemes can take the place of low level terms in broad structures), direct mapping (between concept expressions in different controlled vocabularies), co-occurrence mapping (where terms from two vocabularies occurring in the same metadata record imply association), switching (where an intermediate switch language gathers like terms from source languages), linking by temporary union list (where whole or partial word match of query terms provides a linkage), and linking by thesaurus protocol (where a programmatic process creates the linkage). The links so created are then stored in authority records, concordances, semantic networks, or lexical databases. These efforts are encouraging and work will continue.










Text Mining Generating Hypotheses From MEDLINE
Padmini Srinivasan
Published online 16 December 2003

Srinivasan has created algorithms for text mining utilizing the Swanson-Smalheiser hypothesis generating framework which, in the past, has required considerable manual processing. By applying MESH based profiles the algorithms generate terms indicating novel relationships. A profile begins with a term frequency, inverse document frequency, weighted vector of MESH terms built by extracting terms from a set of documents relevant to the initial topic. These terms are then grouped by their UMLS semantic types, and 134 semantic type vectors of MESH term vectors are created. This profile represents the relative importance of the terms within the semantic types, which are ranked by term count, and the terms whose weight exceeds a threshold in each type retained. Documents retrieved by searching the retained terms from user specified vector components are used to create new profiles which are combined while deleting terms whose postings conjoin with the original document set. The remaining terms in each semantic type are ranked and displayed as indicators of potential novel relations. It is also possible to modify the algorithm to a closed form and begin with two topical document sets when searching for a hypothesized link and five tests are conducted in this manner as well as two in the open form with one initial topic. All tests yield term lists with novel relationship suggestive terms ranked high.







Making Digital Libraries Effective Automatic Generation of Links for Similarity Search Across Hyper-textbooks
Massimo Melucci
Published online 20 January 2004

Melucci wants to generate, without manual intervention, hyper-links between pages of digital textbooks which allow useful reader navigation. Using van Rijsbergen's and Belew's textbooks as a base with the personal names removed from the Belew index, page text and index text were stop worded, stemmed and weighted by occurrence. The page page matrix formed was used to compute similarity measures and links established if a threshold was exceeded.  The terms in the index derived matrix are expanded using similar keywords from the page matrix, and a term term matrix generated, the terms clustered, and a matrix of centroids produced to create broad content based links that can be matched with initial information requests.









Characterization of the Impact of Sets of Scientific Papers The Garfield (Impact) Factor
Peter Vinkler
Published online 16 January 2004

Vinkler notes that if within a set of journals the number of citations obtained and the number of references given are identical(a meta journal including all papers on a topic) then the  set's Garfield Impact factor is the same as the mean chance for citedness of the journal papers. Vinkler's Relative Publication Growth index is the number of papers published in an initial year divided by the papers generated in a chosen time period and thus the Garfield Impact factor is equal to the relative publication growth index times the mean number of references in a set for a time period. If the annual publication rate is constant both Garfield's Impact factor and the chance for citedness are constant but if the relative number of publications increases while references are at a stable rate, or increasing, the chance for citedness increases. The Garfield Impact factor measures the relative contribution of a journal to the total impact of journals in the field and is shown to be equivalent to a measure specifically designed to represent this relative contribution.










Constructing an Associative Concept Space for Literature-Based Discovery
C. Christiaan van der Eijk, Erik M. van Mulligen, Jan A. Kors, Barend Mons, and Jan van den Berg
Published online 20 January 2004

van-der-Eijk, et alia, are interested in automating Swanson/Smalheiser type knowledge discovery by the use of an Associative Concept Space (ACS) where extracted normalized terms are mapped to MeSH headings and these weighted headings are added to the extracted terms set to characterize each document. Co-occurrence in a document will cause terms to be considered related, with the amount of co-occurrence reflecting distance in the space. Close concepts not normally associated may then be observed. Simulated data for a two dimensional space were created in the form of 100 ten term characterizations of documents with four concepts mapped to each. Starting with a random location 5 cycles of the clustering algorithm grouped the concepts and 20 cycles resulted in clear separations in the space where relative positions were similar with different starting points. A 13,423 Medline subset yielded 9,770 concepts which a threshold of .4 reduced to 5,378, yielding 109,430 edges which the algorithm converted into an eight dimensional ACS. Deafness and macular degeneration are in close proximity, and a search of Medline provides documents where the association is explicit, despite the fact that such association does not exist in the studied subset. 











Time-Tracking of the Research Profile of a Drug Using Bibliometric Tools
María Bordons, Carmen Bravo, Santos Barrigón
Published online 16 January 2004

Bordons, Bravo, and Barrigon analyze the 11,626 document aspirin as a major term literature as found in MEDLINE from 1960 to 2001, in order to determine and graphically represent the research trends on the drug over time.  Check tags, aspirin sub-headings, and other MeSH terms assigned to the retrieved documents were identified, and the frequency of index term occurrence by years was determined.  Using correspondence factor analysis, year profiles for index terms and index term profiles for years are created and those above a threshold represented using Chi square distance interpreted as deviations from the average profiles. Thirty seven one year clusters of terms were created and an ascending hierarchal method used to reach a single cluster. The number of clusters that will decrease inter-cluster inertia are determined and clusters are then reduced by an iterative computation of centers of gravity and reallocation of years. Over the 37 year period the main aspects of study were therapeutic use(28%), pharmacodynamics (28%), adverse effects (18%), and administration and dosage (10%). Therapeutic use and administration and dosage show an increasing trend over time. Three time periods were identified based upon sub-headings 1965 to 1971 is characterized by adverse effects, 1972 to 1987 is a pharmacological period, while 1988 to 2001 includes new therapeutic uses. Based upon other indexing terms, four periods appeared one of 14 years, the second of 11, and a third and fourth of six.


Brief Communication






Unlocking the Museum A Manifesto
Corinne Jörgensen
Published online 16 January 2004

Jorgensen expresses her concern for the current growing loss of bibliographic control due to growing digital production and the inability of our society to preserve, and make accessible, many more traditional information sources. She suggests that distributed description, description by local communities using open standards; distributed collection building, linked surrogation of materials in private hands; and distributed knowledge creation, are local creative processes that will, it is assumed, be generated by distributed description and collection, and provide a path to a solution.


Book Reviews



The Bottom Line Determining and Communicating the Value of the Special Library, by Joseph R. Matthews
Sara R. Tompson
Published online 16 December 2003



Virtual Inequality Beyond the Digital Divide, by Karen Mossberger, Caroline J. Tolbert, and Mary Stansbury
Wallace Koehler
Published online 18 December 2003

ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2004, Association for Information Science and Technology