JASIST IndexJASIST Table of Contents

Journal of the Association for Information Science and technology

 EDITORIAL

 

In This Issue
   
Bert R. Boyce
 

605

 

 RESEARCH

 

Mooers' Law: In and out of Context
   
Brice Austin
    Published online 26 April 2001

In this issue we begin with ``Mooers' Law: In and Out of Context.'' Brice points out that Moores meant that having information was not always considered a good thing by a user since it required the expenditure of effort to make use of it, not that a system might not be used because the use itself was an expenditure of extra effort. While this may be a principle of retrieval usage it is not that stated by Moores. This leads to a suggestion that system use depends upon the user's environmental level of desire for information; If high any IR system will be used, if low, no IR.

607

 

 

 

 

 

 


Author Inflation Leads to a Breakdown of Lotka's Law
    Hildrun Kretschmer and Ronald Rousseau
    Published online 27 April 2001

Fractional counting of authors of multi-authored papers has been shown to lead to a breakdown of Lotka's Law despite its robust character under most circumstances. Kretschmer and Rousseau use the normal count method of full credit for each author on two five-year bibliographies from each of 13 Dutch physics institutes where high co-authorship is a common occurrence. Kolmogorov-Smirnov tests were preformed to see if the Lotka distribution fit the data. All bibliographies up to 40 authors fit acceptably; no bibliography with a paper with over 100 authors fits the distribution. The underlying traditional "success breeds success" mechanism assumes new items on a one by one basis, but Egghe's generalized model would still account for the process. It seems unlikely that Lotka's Law will hold in a high co-authorship environment.

610
 


 

 

 

 

 

 


Visualization of Term Discrimination Analysis
    Jin Zhang and Dietmar Wolfram
    Published online 26 April 2001

Zang and Wolfram compute the discrimination value for terms as the difference between the centroid value of all terms in the corpus and that value without the term in question, and suggest selection be made by comparing density changes with a visualization tool. The Distance Angle Retrieval Environment (DARE) visually projects a document or term space by presenting distance similarity on the X axis and angular similarity on the Y axis. Thus a document icon appearing close to the X axis would be relevant to reference points in terms of a distance similarity measure, while those close to the Y axis are relevant to reference points in terms of an angle based measure. Using 450 Associated Press news reports indexed by 44 distinct terms, the removal of the term ``Yeltsin'' causes the cluster to fall on the Y axis indicating a good discriminator. For an angular measure, cosine say, movement along the X axis to the left will signal good discrimination, as movement to the right will signal poor discrimination. A term density space could also be used. Most terms are shown to be indifferent discriminators. Different measures result in different choices as good and poor discriminators, as does the use of a term space rather than a document space. The visualization approach is clearly feasible, and provides some additional insights not found in the computation of a discrimination value.

    615

 


 

 

 

 

 

 

 

 

 

Scholarly Use of Internet-Based Electronic Resources
    Yin Zhang
    Published online 11 April 2001

By Internet resources Zhang means any electronic file accessible by any Internet protocol. Their usage is determined by an examination of the citations to such sources in a nine-year sample of four print and four electronic LIS journals, by a survey of editors of these journals, and by a survey of scholars with "in press" papers in these journals. Citations were gathered from Social Science Citation Index and manually classed as e-sources by the format used. All authors with "in press" papers were asked about their use and opinion of Internet sources and for any suggestions for improvement. Use of electronic sources is heavy and access is very high. Access and ability explain most usage while satisfaction was not significant. Citation of e-journals increases over the eight years. Authors report under citation of e-journals in favor of print equivalents. Traditional reasons are given for citing and not citing, but additional reasons are also present for e-journals. 

    628

 

 

 

 

 

 


 

 FEATURES

 

Real-Time Adaptive Feature and Document Learning for Web Search
    Zhixiang Chen, Xiannong Meng, Richard H. Fowler, and Binhai Zhu
    Published online 27 April 2001

Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.

655

 

 

 

 

 

 

 

 

 


An Empirical Comparison of Visualization Tools to Assist Information Retrieval on the Web
    Misook Heo and Stephen C. Hirtle
    Published online 26 April 2001

The reader of a hypertext document in a web environment, if maximum use of the document is to be obtained, must visualize the overall structure of the paths through the document as well as the document space. Graphic visualization displays of this space, produced to assist in navigation, are classified into four groups, and Heo and Hirtle compare three of these classes as to their effectiveness. Distortion displays expand regions of interest while relatively diminishing the detail of the remaining regions. This technique will show both local detail and global structure. Zoom techniques use a series of increasingly focused displays of smaller and smaller areas, and can reduce cogitative overload, but do not provide an easy movement to other parts of the total space. Expanding outline displays use a tree structure to allow movement through a hierarchy of documents, but if the organization has a wide horizontal structure, or is not particularly hierarchical in nature such display can break down. Three dimensional layouts, which are not evaluated here, place objects by location in three space, providing more information and freedom. However, the space must be represented in two dimensions resulting in difficulty in visually judging depth, size and positioning.  Ten students were assigned to each of eight groups composed of viewers of the three techniques and an unassisted control group using either a large (583 selected pages) or a small (50 selected pages) web space. Sets of 10 questions, which were designed to elicit the use of a visualization tool, were provided for each space. Accuracy and time spent were extracted from a log file. Users views were also surveyed after completion. ANOVA shows significant differences in accuracy and time based upon the visualization tool in use. A Tukey test shows zoom accuracy to be significantly less than expanding outline and zoom time to be significantly greater than both the outline and control groups. Size significantly affected accuracy and time, but had no interaction with tool type. While the expanding tool class out performed zoom and distortion, its performance was not significantly different from the control group.
 

666


 

 


 

 

 


 

 

 

 

 

 

 

 

 

Use of Relevance Criteria across Stages of Document Evaluation: On the Complementarity of Experimental and Naturalistic Studies
    Rong Tang and Paul Solomon
    Published online 26 April 2001

Tang and Solomon, based upon their review of the history of topical and other than topical criteria in relevance evaluation, decide to look at a two stage model where judgements are first made on surrogate records and then on full document text to determine if a criteria shift takes place and if so in what manner and to what degree. Both a controlled experiment and a naturalistic study were used to study the staging of relevance judgement criteria. In the controlled environment 90 undergraduate Psychology students were instructed to choose papers that would help them meet an assignment from 20 preselected papers on broader topic that included that assigned. They first selected on the basis of citation and abstract, then read the papers, and in each process filled out a questionnaire on the importance of each of 15 criteria at each stage of the two-stage process. In the naturalistic study 9 Ph.D. Psychology students conducted literature searches to support their own research and were asked to think aloud while making their decisions from retrieved surrogates, and later filled out a questionnaire while reading those materials that they selected and then interviewed at the end of the process. Apparently understandability is important at both stages. Importance increased at stage two. Cognitive criteria do not all follow the same pattern across stages. The controlled group thought quality of information was most important in stage one and topicality most important in stage 2. In the naturalistic study topicality was most frequent for stage one and research structure for stage two. A classification of criteria by their functionality is suggested as a better approach. First a division as to whether a criterion is objectively associated with the document as opposed to being subjectively associated with a person's expectations; then a division based on primary (essential)
or secondary (for assistance) status. .
 

676

 


 

 

 

 

 

 

 

 

 

 

 

 

 

Multimedia Exploratory Data Analysis for Geospatial Data Mining: The Case for Augmented Seriation
    Myke Gluck
    Published online 1 May 2001

To prevent type-one error, statisticians tend to accept the possibility of type-two error, which leads to the rejection of hypotheses later shown to be true. In both Exploratory Data Analysis and data mining the emphasis is more appropriately on the elimination of type-two error. Thus EDA methods, including its visualization tools may be appropriate for Data Mining. Seriation, creates a matrix of observations and variables, where the cells contain an icon whose size represents its value, and permits the movement of rows and columns in order to visually discern patterns. Augmented Seriation, a method of data mining, adds computer graphics, sound, color, and extra dimensions to the matrix so that the analyst has different modalities for pattern observation. Gluck has developed software for such analysis.

686

 

 

 

 

 

 

 

     
 

LETTERS TO THE EDITOR

697

     

ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright 2001, Association for Information Science and Technology