Journal of the Association for Information Science and Technology

Table of Contents

Volume 54  Issue 11


In This Issue



In this issue
Bert Boyce


Research Article








Quality Control in Scholarly Publishing a New Proposal
Stefano Mizzaro
Published Online 4 Jun 2003

Mizzaro presents a model for scholarly communication that permits the use of electronic journals, removes the reviewing process, while maintaining quality of papers, and measures the quality of researchers' contributions. Journal subscribers, both as authors and as readers, have scores associated with them, as do contributed papers.  An author's score increases with the publication of papers judged positively by readers, and a reader's score decreases when a judgement highly at variance with the mean judgement is expressed, and paper's scores depend upon cumulated reader's judgements. A steadiness score is associated with each of the other scores. Judgements on papers lead to update of the paper's score, and thus the scores of its authors and readers. A paper's score is the mean of the judgements of its readers, each weighted by that reader's score. An author's score is the weighted mean of the papers previously published, and a reader's the weighted mean of the goodness of previously expressed judgements.







Peripheral Social Awareness Information in Collaborative Work
Michael B. Spring, Vichita Vathanophas
Published Online 12 Jun 2003

Spring and. Vathanophas investigate the effect of awareness by team members of the work of other members of their team on productivity. Sixty undergraduates were assigned to twenty groups of three all using the CASCADE collaborative authoring system.  Each subject in a group worked in a different location on assigned tasks communicating with the team only by e-mail. Information on the number of actions taken by a team member, the percentage of required minutes actually worked, and a measure commitment to the project were collected and available to half the participant teams. The use of the awareness tool is associated with a decrease in work quality and intergroup communication. It is possible that the tool reduced the need for communication and that it negatively influences the effort of some subjects.











Performance Measurement Framework for Hierarchical Text Classification
Aixin Sun, Ee-Peng Lim, Wee-Keong Ng
Published Online 4 Jun 2003

The evaluation of automatic classification of documents normally has taken place in flat schemes where hierarchical structure is not taken into account. Since partial success is possible if a document is classified correctly at a high level but mis-classified at a lower level, new measures should reflect hierarchical information. The traditional recall and precision based measures will not indicate that classification into classes similar to the correct ones is superior to classification into totally unrelated groupings. Sun, Lim, and Ng advocate maintenance of pair-wise category similarity values and an average category similarity. If wrong assignment occurs the values in the contingency table for recall and precision are modified using the similarity values but limited to a zero to one range. Category similarity can be replaced with number of links between categories in the hierarchical tree if an acceptable distance is specified by a user. Since in a  hierarchical classification, mis-categorization at a higher level will lead to mis-categorization by a lower level classifier, the number of such documents blocked as a proportion of those that should be classified at a low level is termed the blocking factor for the higher level. This value can provide valuable information on the performance of subtree classifiers.  Using the Reuters 21,578 document news collection which is organized into 135 categories, three category trees were manually derived.  Binary classifiers were trained at each level of hierarchy, and when run on the test portion of the collection, the new measures computed. Support Vector Machine classifiers out performed Naive Bayes classifiers.









A Comparison of Youngsters' Use of Cd-rom and the Internet as Information Resources
Andrew K. Shenton, Pat Dixon
Published Online 12 Jun 2003

Shenton and Dixon draw a sample from six high preforming English schools in the town of Whitley Bay. Three were first schools, two middle schools and one a high school. Choosing at random from one class in each year group 188 subjects were selected all of whom had been exposed to CD-ROM searching and some to Internet search. Twelve focus groups and 121 individual interviews were utilized to gather subjects' articulations on their own information behavior. Subjects generally attempted to converge upon a particular article of interest or even mere specifically, material in such an item. The target item in CD-ROMs was often an encyclopedia entry, and with the Internet, a web page. Subjects often had favorite encyclopedias or search engines which they used repeatedly, and often had a favorite website for awareness of developments in an area of interest. Single word or short phrase searches were the norm without Boolean operations in either medium. There was an expectation of quick satisfaction and little concern for accuracy or authority of retrieved sources. Home use of CD-ROM files was common while many children had no home internet access, or had such access restricted by their parents. Internet use increased with respondent age but older subjects found it slow, noisy, and less than user friendly. CD-ROM usage decreases with age.











Relevance Data for Language Models Using Maximum Likelihood
David Bodoff, Bin Wu, K. Y. Michael Wong
Published Online 12 Jun 2003

Bodoff, Wu, and Wong use a relevance feedback model that requires the searcher to establish hypothetical distributions for the relevance assessments for each document query pair, the hypothetical distribution of documents in the true document vector, and the distribution of queries in the true query vector. They then use a maximum likelihood estimation to find optimized document and query representations and thus adjust both document and query vectors. One such a model might use the cosine (D,Q) for relevant documents and 1- cosine (D,Q) for non-relevant documents, while assuming normal distributions for document and query error and using maximum likelihood to minimize the angles between document vectors, query vectors and between document and query vectors, with the resulting new values used for later queries. It would also be possible to assume both true and observed vectors to be of unit length so that the distributions all depend upon the angle between observation and mean resulting in a (cosine, cosine, cosine) model rather that a (cosine, normal, normal) model which would result in a maximum likelihood function similar to the traditional Rocchio heuristic. Using five vector space models ( tf*idf, plus four feedback methods - Rocchio heuristic, Bartell, maximum likelihood, and alpha-beta heuristic, which adjusts documents toward adjusted rather than original queries) with the Cranfield and CISI data two thirds of the queries were randomly chosen for training, the document indexes trained for each method, and the remaining one third tested. Both maximum likelihood models ran rapidly and resulted in highly significant improvement over the baseline and both heuristics using average precision.









An IP-level Analysis of Usage Statistics for Electronic Journals in Chemistry Making Inferences about User Behavior
Philip M. Davis, Leah R. Solla
Published Online 4 Jun 2003

Davis and Solla study downloads of 29 ACS electronic journals at Cornell University during a three month period by individual IP addresses rather than unidentifiable individual users. Chemistry and Chemical Engineering accounted for 42% of downloads, followed by other Engineering departments at 12.5%, Medical College at 6.5%, Food Science at 4.9%, and Molecular Biology at 2.6%. Libraries accounted for 3.4% and the remote modem pool only 1.5%. Three percent of users downloaded more than 100 articles, 14% more than 20, and 38% downloaded 1 or 2 articles during the sample period. With the exception of two outliers, JACS and Biochemistry, the relationship between number of downloads and number of IP addresses is linear. A thousand downloads will lead to an expectation of 114 using addresses. The relationship between number of journals consulted and number of articles downloaded is quadratic and outliers are heavy users of one or two journals. Journals consulted per IP address seems to fit a Lotka distribution. The system appears to be used heavily for print on demand copies.






Greeklish an Experimental Interface for Automatic Transliteration
Alexandros Karakos
Published Online 12 Jun 2003

In script transliteration the generated character string may not always be pronounced as was the source character string, since the phonetic habits of those using the alphabet of the generated string will govern. Since the Internet normally uses ASCII and thus is restricted to the Roman alphabet, transliteration is a problem for users of non-roman alphabets but none-the-less string conversion is useful. Greeklish is the expression of Greek words in the Roman alphabet, and in this paper, the name for a herein described C++ Windows application provided by Karakos that transcribes any text's characters on the Windows clipboard from Greek to English or vice versa.


Letters to the Editor



The Sample Size Dependency of Statistical Measures and Synchronic Potentiality in Informetrics. Some Comments on Some Comments by Professor Burrell
Fuyuki Yoshikane, Kyo Kageura, Keita Tsuji
Published Online 25 Jun 2003



The Sample Size Dependency of Statistical Measures in Informetrics? Some Comments
Quentin L. Burrell
Published Online 12 Jun 2003

ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright © 2003, Association for Information Science and Technology