JASIST IndexJASIST Table of Contents

Journal of the Association for Information Science and technology



In This Issue
Bert R. Boyce





User Preferences in the Classification of Electronic Bookmarks: Implications for a Shared System
    Lisa Gottlieb and Juris Dilevko
    Published online 9 March 2001

In this issue we begin with Gottlieb and Dilevko's look at the manner in which website users create a classified organization for their bookmarks of sites of recurring interest with an eye toward possible multiple usage as a knowledge management tool. Automatic categorization software working with URLs has not been particularly successful and does not reflect local usage. Fifteen graduate accounting students were given 60 financial industry URLs and asked to visit them, bookmark them, and categorize the bookmarks into folders. A week later a questionnaire tailored to reflect options in each participant's created structure was used to solicit information on motivation for decisions concerning that structure and its use. Categories for division were induced from the responses to the questions in a manner similar to Kwasnik's 1991 study, and grouped as to their context (user) or content (site) character. Content factors accounted for 48% of the decisions and 23% were attributed to a combination of content and context factors which is in contrast to the results of the Kwasnik study which used materials selected by participants not provided
for them. Absence of detailed familiarity with a site may make categorization by context (use) more difficult than by content. However, based on the results, subjective factors seem to play a small role in the development of bookmark structures for multiple users.













Small Worlds: Normative Behavior in Virtual Communities and Feminist Bookselling
    Gary Burnett, Michele Besant, and Elfreda A. Chatman
    Published online 12 March 2001

Burnett, Besant, and Chatman take a preliminary look at small common cultural space communication as realized in both virtual communities and in the feminist book selling community through the lenses of normative behavior theory. After reviewing the literature on both groups, they point out that analysis of archived texts make study of the behavior of virtual communities as small worlds a serious possibility, just as does the existence of Feminist Bookstore News which has provided a communication channel for some 25 years despite the existence here of some face-to-face contact as well. Normative behavior theory (NBT) is seen as providing a framework for further study of these communities. The social norm context of NBT for a virtual community can be found in FAQ files and also observed in regular textual exchanges. The social norms of the feminist bookstore are more directly observable and focused, although one prevalent norm would appear to be a resistance to socially enforced norms. The world view context of NBT is to be found in a context of shared interest in the virtual community and for the feminist bookseller it is embodied in the idea of gender inequality as a social evil to be actively opposed. The social types context of NBT in virtual communities is dominated by the insider outsider split with messages from outsiders less highly valued. However, such sub-types as lurkers and newbies exist. In the feminist bookstore world the insiders are the owners and staff of the stores, although this group is divided into those considered professional booksellers, and those who just work there. Also feminist typing external to the bookstore environment also matters within the community. Finally the information behavior context of NBT in virtual communities, we find most participants exchange queries and replies, post announcements, and provide links to relevant information sources. The feminist bookseller's dialog is pointed toward social change and reader's advisory. 















Using Concepts in Literature-Based Discovery: Simulating Swanson's Raynaud-Fish Oil and Migraine-Magnesium Discoveries
    Marc Weeber, Henny Klein, Lolkje T.W. de Jong-van den Berg, and Rein Vos
    Published online 19 March 2001

Weeber, et alia, reproduce the results of Swanson's hypothesis discovery method by translating text into Unified Medical Language System Metathesaurus (UMLS) terms. The literatures in need of review in the Swanson process are likely to be quite large, and the number of n-grams generated larger still, so a method of reduction has value. Eight thousand six hundred and twenty seven different words were extracted from 1246 documents containing the word raynaud, and using the ARROWSMITH stop list filter and bigrams and trigrams 8362 interesting terms were identified. The same documents yield 5998 UMLS concepts, and only 145 of these are grouped in functional semantic types in the thesaurus. If these are searched and dietary semantic types only are generated, fish oil is a prominent result. Swanson's migraine-magnesium link is also identified through this process. In order to limit by semantic types some subject expertise will benecessary to the operation.











Hyperauthorship: A Postmodern Perversion or Evidence of a Structural Shift in Scholarly Communication Practices?
    Blaise Cronin
    Published online 13 March 2001

Cronin examines the implications of the extraordinary growth in the number of authors of single papers in the biomedical and high energy physics domains. Such practices call into question the assumption that the author(s) bear sole responsibility and deserve sole credit for the work. While collaboration and multiple authorship certainly existed before the 20th century, the advent of big science has stimulated it. Papers with 100 authors increased from one in 1981 to 182 in 1994. Increasing collaboration and the highly technical nature of modern research mean coauthorship is inescapable, but honorific, guest and gift authorship are not currently uncommon and major contributors are at times excluded. Generally accepted guidelines for authorship do not exist. The difficulty of fixing the degree of an individual's contribution is not trivial and the standard model of authorship is in need of revision with likely variations in different genera. 










The Effect of Pool Depth on System Evaluation in TREAC
    Sabrina Keenan, Alan F. Smeaton, and Gary Keogh
    Published online 2 March 2001

In TREC evaluation pool size for a query is in a range between 100 (the top 100 of each run are evaluated) and 100 times the number of runs made by participants. The pool is assessed to determine actual relevant documents for computation of recall and precision measures on the top 1000 documents generated by each run. Keenan, Smeaton, and Keogh look at the relevant sets resulting from each run with pool depths restricted to levels below 100. Increases in depth increase the numbers of relevant documents found by each system, and decrease the number of new relevant documents found per step. The relationship is modeled by y = 1/([alpha]+[beta])/x, where y = number of relevant documents in thousands, and x = pool depth, and rewritten as 1/y = [alpha]+[beta](1/x) to estimate [alpha] and [beta] by least squares. A good fit results. The worst case error at depth 100 is 6.9% suggesting that depth 100 is robust. Large values of 1000/[alpha] indicate good performance in runs of length, while low [beta] will indicate good performance in short runs. Interestingly large values of 1000/[alpha] are associated with small values of [beta], and it seems likely that good systems can be identified from short runs. 











Employing the Resolution Power of Search Keys
    Ari Pirkola and Kalervo Jarvelin
    Published online 19 March 2001

In any database some search terms will more effectively discriminate between relevant and non-relevant documents for a given search. By resolution power Pirkola, and Jarvelin mean a term's ability relative to other terms in a search to increase query performance. Utilizing the InQuery retrieval system which computes a probability value for a document for each term, a 515,825 document TREC subset and the 47 queries from topic set 101-150 that contained at least 5 substantive terms, they attempt to identify the good and bad terms, their statistical properties, and how such knowledge can improve effectiveness. All possible non empty combinations of the five terms for each query were generated and utilized, resulting in 1457 searches which were then ranked by average precision. The order of terms in the best case was then used to rank the terms, with the lead term considered best, terms that were in a combination that exceeded the best term searched alone were considered good, and others bad. For each term, document frequency, collection frequency, and within document frequency were calculated. Terms were assumed to have high resolution power if collection frequency over document frequency exceeded some threshold and either within document frequency over document frequency was greater than the same measure for other query terms times some constant or alternatively whose document frequency was less than or equal to that of any other term in the query divided by a constant.The frequency statistics of good and bad keys are quite similar. High resolution power terms can be identified automatically, but ranking in between does not correspond well to after the fact analysis. Use of the best term in a structured query will improve performance.














Chinese Document Indexing Based on a New Partitioned Signature File: Model and Evaluation
    Wai Lam, Kam-Fai Wong, and Chi-Yin Wong
    Published online 21 March 2001

Lam, et alia, suggest the partition of the signature file when that method is used to index documents in Chinese into segments based upon the number of characters in each word. Since the number of Chinese words that exceed 5 characters in length is very small, this normally will partition the signature file into five segments. The result will be faster searching
than with a single signature file, but either increased overhead or increased false drops. The model provides a parameter which will allow the searcher to control this trade off. A hashing function is considered superior to a pre-stored signature table in a dynamic environment although duplicate word signatures and thus false drops are sure to occur. A frequency based hashing function uses the frequency of each word in the corpus to rank them in descending order. Words are then grouped by assigning the highest frequency ungrouped term to the group with the smallest sum of frequencies in order to balance the total group frequencies. The initial bits of the hashed value indicate the group and the remainder are a simple hash of the word. This value will reduce collisions, signature file duplications, and false drops. Groups could also be created based upon the first Chinese character of the word which would alleviate the need for the frequency processing. The partitioned signature file method and single signature file method were compared using the TREC-5 Chinese collection. A 70% improvement in precision and a 50% reduction in search space was achieved. Both grouped hashing methods equaled table look-up performance.
















The Robot in the Garden: Telerobotics and Telepistemology in the Age of
the Internet, edited by Ken Goldberg
Alexander Halavais
    Published online 2 March 2001





After the Internet: Alien Intelligence, by James Martin
Luca I. G. Toldo
    Published online 22 March 2001











ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:

Copyright 2001, Association for Information Science and Technology