Journal of the Association for Information Science



Bert R. Boyce




A Cognitive Model of Document Use during a Research Project. Study II. Decisions at the Reading and Citing Stages
P. Wang and M. D. White

We begin with Wang and White's follow up of a study of the process of document selection investigating post-selection citing and reading behavior. Fifteen of the original subjects were subjected to structured interviews based upon the reasoning that led to the bibliographies of the products of their research. Fourteen new criteria were found in the analysis of the interviews. Ten of the eleven criteria of the original study are used in reading and citing decisions, but four of these: novelty, expected quality, reading time, and availability, were not used in citing decisions.




Combining Mapping and Citation Analysis for Evaluative Bibliometric
Purposes: A Bibliometric Study
E. C. M. Noyons, H. F. Moed, and M. Luwel

The published production of the Interuniversity Micro-Electronics Center in Belgium for 10 years was collected by Noyons, Luwel, and Moed from INSPEC. Citation data for each publication were extracted from Science Citation Index, with self-citation kept separately. The INSPEC classification codes were then clustered by co-occurrence into subdomains, and the number of publications by other institutes in these subdomains counted leading to six comparable institutes for baseline data. The publications of the seven institutes are used to produce cognitive maps of the topic areas and relative activity in the areas between IMEC and the benchmark institutes.




Meta-Information, and Time: Factors in Human Decision Making
M. Higgins

Higgins has students make decisions on pairs drawn from school applicants. One of each pair had references from well-known schools, the other references from less prestigious schools. Source credibility was a significant variable in the decision process, both with and without time constraints. The levels of preference were considerably lower under time constraints and the interaction between time and credibility was negative for choice and preference level.




A Study of the Use of Variables in Information Retrieval User Studies
W. Yuan and C. T. Meadow

Yuan and Meadow contend that common use of highly similar variables in published studies indicates a common subject matter or approach. A set of twelve papers was chosen and their variables recorded and formed into categories to the satisfaction of their authors. After several iterations, a six category scheme of variables was formed. If a listed variable was used, or implied, it was coded for that paper.

Similarity measures for authors and for papers were computed for each category and over all categories of variables, and associations within author groups and among author groups are shown. Since the links are based upon what authors do--not, as in citation, what they say--such links may well be the stronger.




Abstracting of Legal Cases: The Potential of Clustering Based on the Selection of Representative Objects
M..F. Moens, C. Uyttendaele, and J. Dumortier

Moens, Uyttendaele, and Dumortier automatically extract informative paragraphs of text from Belgian legal cases. Text is parsed into a structured representation and paragraphs are converted to vectors of terms weighted by in paragraph occurrence. The vectors are compared with a cosine coefficient and clusters formed around a given number of representative paragraphs. The central paragraphs are then chosen as representative.

Using expert extracted paragraphs to provide an ideal set for evaluation--recall, precision, fallout, and overgeneration--the percentage of spurious responses among the generated responses were computed. The measures indicate considerable success, and the method would appear to have some generality.




An Analysis of Web Page and Web Site Constancy and Permanence
W. Koehler

Koehler treats web site constancy as a binary measure based upon a change in a time sample. Permanence is the probability a web document will carry the same URL over time. An intermittent site fails to resolve at a given time but returns. A comatose site fails to resolve, and has failed to resolve for six or more weekly queries. A random selection of 361 sites was chosen, 343 of which responded and remained available at the start of monitoring.

In three whole site probes, 5.2% of the sample sites could be judged intermittent. For web pages, 31% failed to respond at the end of sample period, 90% of these being comatose. Sixty four percent of the comatose pages were part of live sites. Only 2.7% of sites remained unchanged at the second probe, and none at the final probe. Variables such as size, density, and directory structure depth are not clearly related to constancy and permanence.




The Monte Carlo Method and the Evaluation of Retrieval System Performance
R. Burgin

Burgin suggests that the Monte Carlo method can identify levels of retrieval performance that are unusual relative to levels achieved by random retrieval. As test collections increase in size, the results converge with Shaw's hypergeometric model, and thresholds are very similar. The method is far less computer intensive than Shaw's approach of determining a level of retrieval system performance that can be attributed to chance, and can account for rank-ordered retrieval output. The significance of difference between results can be determined since no assumptions are made about the nature of the data or the distribution. This allows meaningful comparisons of retrieval results.


ASIS HomeSearch ASISMake A Comment

© 1999 , Association for Information Science
Last update: February 18, 1999