JASIST IndexJASIST Table of Contents

Journal of the Association for Information Science and Technology



In This Issue
Bert R. Boyce





Subject Categorization of Query Terms for Exploring Web Users' Search Interests
Hsiao-Tieh Pu, Shui-Lung Chuang, and Chyan Yang
Published online 3 April 2002

In this issue Pu, Chaung, and Yang, given a subject taxonomy, wish to automatically assign terms, or phrases, found in search engine queries to the appropriate categories in that taxonomy. Using high frequency terms from three Taiwanese search engine query logs, human analysts assigned terms to the appropriate taxonomic categories providing a set of terms characterizing each topic. The average length of a query was 3.18 Chinese characters while the average word length in Chinese is approximately 1.5 characters. The average English query was 1.1 words in length. Thus queries were treated as single phrases to be categorized, and since a small number of queries were repeated very often the top 20 thousand query terms (about 81%) of each log were extracted. The taxonomy was built with knowledge of these high frequency terms and the structures of several commercial search services using the Saracevic-Kantor methodology and it incorporated 15 major categories and 85 sub-categories. The 9,709 terms in common from the top 20 thousand from each log were chosen as seed terms for 400 hours of manual categorization. To categorize a term it is searched on multiple engines and terms co-occurring with it in hits are collected. These are then used to measure similarity with the existing taxonomic categories' seed term sets. Auto-categorization results seem encouraging and matching rates with human assignment increase with core terms assigned to categories and with the number of suggested categories included as matches. Certainly the automatic categorization process is much faster. While relative rankings of category use figures do not change over a twelve-month period, each top category has its own distribution over time.















Disciplinary Differences and Undergraduates' Information-Seeking Behavior
Ethelene Whitmire
Published online 4 April 2002

Whitmire attempts to find differences in the information seeking behavior of undergraduates based upon their disciplines using Biglan's disciplinary dimensions; hard/soft, characterized by degree of agreement on questions and methods, pure/applied, characterized by the extent of practical applications, and organic/inorganic, characterized by the degree of study of living objects, by the analysis of 5,175 student responses to the College Student Experiences Questionnaire, demographic data, academic discipline, and responses to ten questions on the degree of library use. The use of t-tests, significant in seven of the ten questions, revealed that students in soft disciplines engaged in more library activity except for use of the library ``as a place to read or study.'' Pure disciplines demonstrated more library activity than applied, with significant results in nine of the ten questions. Organic disciplines made more use of the library, with the exception of ``as a place to read or study,'' than did inorganic with differences statistically significant in six of ten cases.











Multitasking Information Seeking and Searching Processes
Amanda Spink, H. Cenk Ozmutlu, and Seda Ozmutlu
Published online 4 April 2002

Data from three previous studies and one new investigation are analyzed by Spink and the Ozmuthlus to determine if searchers in fact search for multiple topics during single searching processes. One study was based upon an interactive survey of use of the Excite system, and had both textual and quantitative data. From a second study 1000 logged Excite sessions were drawn and manually sifted to identify multitasking. The third study was of on data collected on 87 mediated Dialog searches, which produced pre- and post-search questionnaires for each seeker and a post search questionnaire for the mediator that indicated shifts in topic. The fourth was based on an in house user survey of information seeking volunteers in an academic library. A qualitative analysis revealed that in the first study 13 of the 287 respondents were multitasking. In the second study of the 1000 sessions, multitasking was identified in 114 sessions or 11.4%. Of the 87 mediated searches, 4 cases were identified, and in the fourth study 13 of the 95 participants indicated they were multitasking. A four level model is provided including single search on a single topic, successive searching, where multiple searches on the same but evolving topic are carried out over time, multitasking searching, where multiple topics are searched concurrently, and finally multitasking successive searching.













Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications
Steve Jones and Gordon W. Paynter
Published online 9 April 2002

Jones and Paynter describe an evaluation of Kea, a key phrase extraction algorithm that has been shown to demonstrate performance equal to other such systems when author assigned key phrases are used as the standard. In the test 28 subjects were given 2 documents each from 6 papers of 8 pages and 60 candidate key phrases to rank from 0 to 10 as to subject representation to provide a standard, and both Kea and other extraction systems are tested, and these results compared to author provided phrases. Kea extracts every sequence of one to four words in its training documents, stems and discards phrases that begin or end with stop words, which consist of only a proper noun, those that do not match length constraints and those that occur only once in the document. The Kappa statistic and the Kendal Coefficient of Concordance show significant subject agreement on phrase choice. Precision was defined as the proportion of the extracted phrases that were human selected phrases, and recall the proportion of human selected phrases extracted. At least sixty percent of the extracted phrases were in the ideal set, and recall rises from a third to a half as more phrases are permitted to be selected. Proper settings appear to be crucial for maximum performance. Author assigned phrases perform quite well.












Visualizing and Tracking the Growth of Competing Paradigms: Two Case Studies
Chaomei Chen, Timothy Cribbin, Robert Macredie, and Sonali Morar
Published online 9 April 2002

Chen et alia, search the Web of Science for ``mass extinction'' and for ``BSE or CJD'' (mad cow disease or Creutzfeldt-Jakob disease) two highly debated topics likely to demonstrate multiple paradigms, and select only those publications that have been cited ten or more times. These are then classed using Principle Components Analysis on their co-citation matrix and the resulting specialities superimposed upon a Pathfinder network scaling model. A publication is visualized as a stack of color marked segments, where each segment represents a year and its size the total number of citations in that year. Thus one can visualize rapidly growing areas easily. In the mass extinction literature 273 publications were chosen using a ten-citation threshold on a twenty-year sample. Impact, gradualism (stressing vulcanism), periodicity, and Permian (an older extinction than the Cretaceous) extinction segments are identifiable. For the ``mad cow-CJD'' link, 379 publications were chosen with thresholds of 600 and 155 respectively. Five thematic areas are identified, BSE, CJD, variant CJD, GSS (which the authors do not define or discuss) and pions, with the growth of the pion paradigm evident in both areas.













Information, Knowledge, Text, by Julian Warner
Jack Andersen
Published online 19 April 2002



ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2001, Association for Information Science and Technology