Journal of the Association for Information Science and technology


 Bert R. Boyce




The Evolution of Preprints in the Scholarly Communication of Physicists and Astronomers
Cecelia Brown
Published online 30 November 2000

    In one of two bibliometric papers in this issue Brown looks at formal publication and citation of Eprints as shown by the policies and practices of 37 top tier physics journals, and by citation trends in ISI's SciSearch database and Journal Citation Reports. Citation analysis was carried out if Eprint cites were indicated by editor response, instruction to authors sections, reports in the literature, or actual examination of citation lists. Total contribution to 12 archives and their citation counts in the journals were compiled. Of the 13 editors surveyed that responded, 8 published papers that had appeared in the archive. Two of these required removal from the archive at publication; two of the 13 did not publish papers that have appeared as Eprints. A review journal that solicits its contributions allowed citation of Eprints. Seven allowed citations to Eprints, but were less than enthusiastic.Nearly 36,000 citations were made to the 12 archives. Citations to the 37 journals and their impact factors remain constant over the period of 1991 to 1998. Eprint citations appear to peak about 3 years after appearance as do citations to published papers. Contribution to the archives, and their use as measured by citation, is clearly growing. Citation form and publishing policy varies from journal to journal.












 IndirectCollective Referencing (ICR) in the Elite Journal Literature of Physics. I. A Literature Science Study on the Journal Level
Endre SzavaKovats
 Published online 22 December 2000

       In the second bibliometric paper SzavaKovtas uses ``indirectcollective references, ICR'' to mean such instances as those in which an author refers to, ``the references contained therein,'' when referring to another source. Having previously shown a high instance of occurrences in Physical Reviews, he now uses the January 1997 issues of 40 journals from the ISI physics category plus two optics journals, an instrumentation journal, and a physics journal launched in 1997, to locate ICR. The phenomena exists in all but one of the sampled journals and in the next, but unsampled, issue of that journal. Overall 17% of papers sampled display ICR with little fluctuation within internal categories.










Learning User Interest Dynamics with a ThreeDescriptor Representation
Dwi H. Widyantoro, Thomas R. Ioerger, and John Yen
 Published online 21 December 2000

    The use of documents ranked high by user feedback to profile user interests is commonly done with Rocchio's `s algorithm which uses a single list of attribute value pairs called a descriptor to carry term value weights for an individual. Negative feed back on old preferences or positive feedback on new preferences adjusts the descriptor at a fixed, predetermined, and often slow pace. Widyantoro, et alia, suggest a three descriptor model which adds two short term interest descriptors, one each for positive and negative feedback. User short term interest in a particular document is computed by subtracting the similarity measure with the negative descriptor from the similarity measure with the positive descriptor. Using a constant to represent the desired impact of long and short term interests these values may be summed for a single interest value. Using the Reuters21578 1.0 test collection split into training and test sets, topics with at least 100 documents in a tight cluster were chosen. The TDR handles change well showing better recovery speed and accuracy than the single descriptor model. The nearest neighbor update strategy appears to keep the category concept relatively consistent when multiple TDRs are used.











  Searching the Web: The Public and Their Queries
  Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, and Tefko Saracevic
  Published online 21 December 2000

    Spink and Greisdorf examine the areas between clearly relevant and clearly nonrelevant documents. Twenty one users conducted 43 searches and made judgements on 1059 retrieved items. A point on a 77 mm line indicating a range from low to high relevance was marked for each item for an interval measure. Boxes for relevant partially relevant, partially not relevant and not relevant were provided for a categorical measure. Judgements were also characterized in a binary fashion on systematic, topical, pertinence, utility and motivational levels, and additionally users provided a brief written description of why they made the judgements they did. The previously apparent bimodal distribution of relevance judgements is confirmed. There is some evidence that topicality is more useful for deselecting than selecting items, and it appears that including partially relevant and retrieved items with retrieved relevant items can skew precision measures in a positive direction. The median of the bimodal distribution of judgements is inversely correlated with the number of items judged since the larger group of nonrelevant will pull it down. Since the median correlates with the distribution percentages of relevant items, if normalized by the number of points in the interval scale, the median becomes a possible measure of precision.












 A Review of Web Searching Studies and a Framework for Future Research
 Bernard J. Jansen and Udo Pooch
 Published online 12 December 2000

   Jansen and Pooch review three major search engine studies and compare them to three traditional search system studies and three OPAC search studies, to determine if user search characteristics differ. The web search engine studies indicate that most searchers use two, two search term queries per session, no boolean operators, and look only at the top ten items returned, while reporting the location of relevant information. In traditional search systems we find seven to 16 queries of six to nine terms, while about ten documents per session were viewed. The OPAC studies indicated two to five queries per session of two or less terms, with Boolean search about 1% and less than 50 documents viewed.








CorpusBased Statistical Screening for ContentBearing Terms
Won Kim and W. John Wilbur
Published online 21 December 2000

   Kim and Wilber present three techniques for the algorithmic identification in text of content bearing terms and phrases intended for human use as entry points or hyperlinks. Using a set of 1,075 terms from MEDLINE evaluated on a zero to four, stop word to definite content word scale, they evaluate the ranked lists of their three methods based on their placement of content words in the top ranks. Data consist of the natural language elements of 304,057 MEDLINE records from 1996, and 173,252 Wall Street Journal records from the TIPSTER collection. Phrases are extracted by breaking at punctuation marks and stop words, normalized by lower casing, replacement of nonalphanumerics with spaces, and the reduction of multiple spaces. In the ``strength of context'' approach each document is a vector of binary values for each word or word pair. The words or word pairs are removed from all documents, and the Robertson, Spark Jones relevance weight for each term computed, negative weights replaced with zero, those below a randomness threshold ignored, and the remainder summed for each document, to yield a score for the document and finally to assign to the term the average document score for documents in which it occurred. The average of these word scores is assigned to the original phrase. The ``frequency clumping'' approach defines a random phrase as one whose distribution among documents is Poisson in character. A pvalue, the probability that a phrase frequency of occurrence would be equal to, or less than, Poisson expectations is computed, and a score assigned which is the negative log of that value. In the ``database comparison'' approach if a phrase occurring in a document allows prediction that the document is in MEDLINE rather that in the Wall Street Journal, it is considered to be content bearing for MEDLINE. The score is computed by dividing the number of occurrences of the term in MEDLINE by occurrences in the Journal, and taking the product of all these values.

   The one hundred top and bottom ranked phrases that occurred in at least 500 documents were collected for each method. The union set had 476 phrases. A second selection was made of two word phrases occurring each in only three documents with a union of 599 phrases. A judge then ranked the two sets of terms as to subject specificity on a 0 to 4 scale.

   Precision was the average subject specificity of the first r ranks and recall the fraction of the subject specific phrases in the first r ranks and eleven point average precision was used as a summary measure. The three methods all move content bearing terms forward in the lists as does the use of the sum of the logs of the three methods.




















An Analysis of Image Queries in the Field of Art History
 HsinIiang Chen
Published online 21 December 2000

       Chen arranged with an Art History instructor to require 20 medieval art images in papers received from 29 students. Participants completed a self administered presearch and postsearch questionnaire, and were interviewed after questionnaire analysis, in order to collect both the keywords and phrases they planned to use, and those actually used. Three MLIS student reviewers then mapped the queries to Enser and McGregor's four categories, Jorgensen's 12 classes, and Fidel's 12 feature data and object poles providing a degree of match on a seven point scale (one  not at all to 7  exact). The reviewers give highest scores to Enser and Mcgreger;'s categories. Modifications to both the Enser and McGregor and Jorgensen schemes are suggested











XML: A Managers Guide, by Kevin Dick
Mark R. Wademan




High Technology and LowIncome Communities. Prospects for the Positive Use of Advanced Information Technology, edited by Donald A. Schon, Bish Sanyal, and William J. Mitchell
M. Zoe Holbrooks




Knowledge Management for the Information Professional, T. Kanti Srikantaiah and Michael E. D. Koenig, editors
Dale A. Stirling




The Information Resources Policy Handbook: Research for the Information Age, Benjamin M. Compaine and William H. Read, editors
Alan T. Schroeder, Jr.








2001 , Association for Information Science and Technology