|
EDITORIAL |
| |
In This Issue Bert R. Boyce |
615
|
RESEARCH |
| |
Subject Categorization of Query Terms for Exploring Web Users' Search Interests Hsiao-Tieh Pu, Shui-Lung Chuang, and Chyan Yang Published online 3 April 2002 In this issue Pu, Chaung, and Yang, given a subject taxonomy, wish to automatically
assign terms, or phrases, found in search engine queries to the appropriate categories in that taxonomy. Using high frequency terms from three Taiwanese search engine query logs, human analysts assigned terms to the
appropriate taxonomic categories providing a set of terms characterizing each topic. The average length of a query was 3.18 Chinese characters while the average word length in Chinese is approximately 1.5 characters.
The average English query was 1.1 words in length. Thus queries were treated as single phrases to be categorized, and since a small number of queries were repeated very often the top 20 thousand query terms (about 81%)
of each log were extracted. The taxonomy was built with knowledge of these high frequency terms and the structures of several commercial search services using the Saracevic-Kantor methodology and it incorporated 15
major categories and 85 sub-categories. The 9,709 terms in common from the top 20 thousand from each log were chosen as seed terms for 400 hours of manual categorization. To categorize a term it is searched on multiple
engines and terms co-occurring with it in hits are collected. These are then used to measure similarity with the existing taxonomic categories' seed term sets. Auto-categorization results seem encouraging and matching
rates with human assignment increase with core terms assigned to categories and with the number of suggested categories included as matches. Certainly the automatic categorization process is much faster. While relative
rankings of category use figures do not change over a twelve-month period, each top category has its own distribution over time.
|
617
|
| |
Disciplinary Differences and Undergraduates' Information-Seeking Behavior Ethelene Whitmire Published online 4 April 2002 Whitmire attempts to find differences in the information seeking behavior of undergraduates based upon their
disciplines using Biglan's disciplinary dimensions; hard/soft, characterized by degree of agreement on questions and methods, pure/applied, characterized by the extent of practical applications, and organic/inorganic,
characterized by the degree of study of living objects, by the analysis of 5,175 student responses to the College Student Experiences Questionnaire, demographic data, academic discipline, and responses to ten questions
on the degree of library use. The use of t-tests, significant in seven of the ten questions, revealed that students in soft disciplines engaged in more library activity except for use of the library ``as a place to read
or study.'' Pure disciplines demonstrated more library activity than applied, with significant results in nine of the ten questions. Organic disciplines made more use of the library, with the exception of ``as a place
to read or study,'' than did inorganic with differences statistically significant in six of ten cases.
|
631 |
| |
Multitasking Information Seeking and Searching Processes Amanda Spink, H. Cenk Ozmutlu, and Seda Ozmutlu Published online 4 April 2002 Data from three previous studies and one new investigation are analyzed by Spink and
the Ozmuthlus to determine if searchers in fact search for multiple topics during single searching processes. One study was based upon an interactive survey of use of the Excite system, and had both textual and
quantitative data. From a second study 1000 logged Excite sessions were drawn and manually sifted to identify multitasking. The third study was of on data collected on 87 mediated Dialog searches, which produced pre-
and post-search questionnaires for each seeker and a post search questionnaire for the mediator that indicated shifts in topic. The fourth was based on an in house user survey of information seeking volunteers in an
academic library. A qualitative analysis revealed that in the first study 13 of the 287 respondents were multitasking. In the second study of the 1000 sessions, multitasking was identified in 114 sessions or 11.4%. Of
the 87 mediated searches, 4 cases were identified, and in the fourth study 13 of the 95 participants indicated they were multitasking. A four level model is provided including single search on a single topic, successive
searching, where multiple searches on the same but evolving topic are carried out over time, multitasking searching, where multiple topics are searched concurrently, and finally multitasking successive searching.
|
639 |
| |
Automatic Extraction of Document Keyphrases for Use in Digital Libraries: Evaluation and Applications Steve Jones and Gordon W. Paynter Published online 9 April 2002 Jones and Paynter describe an evaluation of Kea, a key phrase extraction algorithm that has been shown
to demonstrate performance equal to other such systems when author assigned key phrases are used as the standard. In the test 28 subjects were given 2 documents each from 6 papers of 8 pages and 60 candidate key phrases
to rank from 0 to 10 as to subject representation to provide a standard, and both Kea and other extraction systems are tested, and these results compared to author provided phrases. Kea extracts every sequence of one to
four words in its training documents, stems and discards phrases that begin or end with stop words, which consist of only a proper noun, those that do not match length constraints and those that occur only once in the
document. The Kappa statistic and the Kendal Coefficient of Concordance show significant subject agreement on phrase choice. Precision was defined as the proportion of the extracted phrases that were human selected
phrases, and recall the proportion of human selected phrases extracted. At least sixty percent of the extracted phrases were in the ideal set, and recall rises from a third to a half as more phrases are permitted to be
selected. Proper settings appear to be crucial for maximum performance. Author assigned phrases perform quite well.
|
653
|
| |
Visualizing and Tracking the Growth of Competing Paradigms: Two Case Studies Chaomei Chen, Timothy Cribbin, Robert Macredie, and Sonali Morar Published online 9 April 2002 Chen et alia, search the Web of Science for ``mass
extinction'' and for ``BSE or CJD'' (mad cow disease or Creutzfeldt-Jakob disease) two highly debated topics likely to demonstrate multiple paradigms, and select only those publications that have been cited ten or more
times. These are then classed using Principle Components Analysis on their co-citation matrix and the resulting specialities superimposed upon a Pathfinder network scaling model. A publication is visualized as a stack
of color marked segments, where each segment represents a year and its size the total number of citations in that year. Thus one can visualize rapidly growing areas easily. In the mass extinction literature 273
publications were chosen using a ten-citation threshold on a twenty-year sample. Impact, gradualism (stressing vulcanism), periodicity, and Permian (an older extinction than the Cretaceous) extinction segments are
identifiable. For the ``mad cow-CJD'' link, 379 publications were chosen with thresholds of 600 and 155 respectively. Five thematic areas are identified, BSE, CJD, variant CJD, GSS (which the authors do not define or
discuss) and pions, with the growth of the pion paradigm evident in both areas.
|
678 |
BOOK REVIEW |
| |
Information, Knowledge, Text, by Julian Warner
Jack Andersen Published online 19 April 2002 |
690 |
|