|
EDITORIAL |
| |
In This Issue Bert R. Boyce |
515
|
RESEARCH |
| |
User Preferences in the Classification of Electronic Bookmarks: Implications for a Shared System Lisa Gottlieb and Juris Dilevko
Published online 9 March 2001 In this issue we begin with Gottlieb and Dilevko's look at the manner in which website users create a classified organization for their bookmarks of sites of recurring
interest with an eye toward possible multiple usage as a knowledge management tool. Automatic categorization software working with URLs has not been particularly successful and does not reflect local usage. Fifteen
graduate accounting students were given 60 financial industry URLs and asked to visit them, bookmark them, and categorize the bookmarks into folders. A week later a questionnaire tailored to reflect options in each
participant's created structure was used to solicit information on motivation for decisions concerning that structure and its use. Categories for division were induced from the responses to the questions in a manner
similar to Kwasnik's 1991 study, and grouped as to their context (user) or content (site) character. Content factors accounted for 48% of the decisions and 23% were attributed to a combination of content and context
factors which is in contrast to the results of the Kwasnik study which used materials selected by participants not provided for them. Absence of detailed familiarity with a site may make categorization by context
(use) more difficult than by content. However, based on the results, subjective factors seem to play a small role in the development of bookmark structures for multiple users. |
517
|
| |
Small Worlds: Normative Behavior in Virtual Communities and Feminist Bookselling Gary Burnett, Michele Besant, and Elfreda A. Chatman
Published online 12 March 2001
Burnett, Besant, and Chatman take a preliminary look at small common cultural space communication as realized in both virtual communities and in the feminist book
selling community through the lenses of normative behavior theory. After reviewing the literature on both groups, they point out that analysis of archived texts make study of the behavior of virtual communities as small
worlds a serious possibility, just as does the existence of Feminist Bookstore News which has provided a communication channel for some 25 years despite the existence here of some face-to-face contact as well. Normative
behavior theory (NBT) is seen as providing a framework for further study of these communities. The social norm context of NBT for a virtual community can be found in FAQ files and also observed in regular textual
exchanges. The social norms of the feminist bookstore are more directly observable and focused, although one prevalent norm would appear to be a resistance to socially enforced norms. The world view context of NBT is to
be found in a context of shared interest in the virtual community and for the feminist bookseller it is embodied in the idea of gender inequality as a social evil to be actively opposed. The social types context of NBT
in virtual communities is dominated by the insider outsider split with messages from outsiders less highly valued. However, such sub-types as lurkers and newbies exist. In the feminist bookstore world the insiders are
the owners and staff of the stores, although this group is divided into those considered professional booksellers, and those who just work there. Also feminist typing external to the bookstore environment also matters
within the community. Finally the information behavior context of NBT in virtual communities, we find most participants exchange queries and replies, post announcements, and provide links to relevant information
sources. The feminist bookseller's dialog is pointed toward social change and reader's advisory. |
536
|
| |
Using Concepts in Literature-Based Discovery: Simulating Swanson's Raynaud-Fish Oil and Migraine-Magnesium Discoveries
Marc Weeber, Henny Klein, Lolkje T.W. de Jong-van den Berg, and Rein Vos Published online 19 March 2001 Weeber, et alia, reproduce the results of Swanson's hypothesis
discovery method by translating text into Unified Medical Language System Metathesaurus (UMLS) terms. The literatures in need of review in the Swanson process are likely to be quite large, and the number of n-grams
generated larger still, so a method of reduction has value. Eight thousand six hundred and twenty seven different words were extracted from 1246 documents containing the word raynaud, and using the ARROWSMITH stop list
filter and bigrams and trigrams 8362 interesting terms were identified. The same documents yield 5998 UMLS concepts, and only 145 of these are grouped in functional semantic types in the thesaurus. If these are searched
and dietary semantic types only are generated, fish oil is a prominent result. Swanson's migraine-magnesium link is also identified through this process. In order to limit by semantic types some subject expertise will
benecessary to the operation. |
548
|
| |
Hyperauthorship: A Postmodern Perversion or Evidence of a Structural Shift in Scholarly Communication Practices? Blaise Cronin
Published online 13 March 2001 Cronin examines the implications of the extraordinary growth in the number of authors of single papers in the biomedical and high energy physics domains. Such
practices call into question the assumption that the author(s) bear sole responsibility and deserve sole credit for the work. While collaboration and multiple authorship certainly existed before the 20th century, the
advent of big science has stimulated it. Papers with 100 authors increased from one in 1981 to 182 in 1994. Increasing collaboration and the highly technical nature of modern research mean coauthorship is inescapable,
but honorific, guest and gift authorship are not currently uncommon and major contributors are at times excluded. Generally accepted guidelines for authorship do not exist. The difficulty of fixing the degree of an
individual's contribution is not trivial and the standard model of authorship is in need of revision with likely variations in different genera. |
558
|
| |
The Effect of Pool Depth on System Evaluation in TREAC Sabrina Keenan, Alan F. Smeaton, and Gary Keogh Published online 2 March 2001 In TREC evaluation
pool size for a query is in a range between 100 (the top 100 of each run are evaluated) and 100 times the number of runs made by participants. The pool is assessed to determine actual relevant documents for computation
of recall and precision measures on the top 1000 documents generated by each run. Keenan, Smeaton, and Keogh look at the relevant sets resulting from each run with pool depths restricted to levels below 100. Increases
in depth increase the numbers of relevant documents found by each system, and decrease the number of new relevant documents found per step. The relationship is modeled by y = 1/([alpha]+[beta])/x, where y = number of
relevant documents in thousands, and x = pool depth, and rewritten as 1/y = [alpha]+[beta](1/x) to estimate [alpha] and [beta] by least squares. A good fit results. The worst case error at depth 100 is 6.9% suggesting
that depth 100 is robust. Large values of 1000/[alpha] indicate good performance in runs of length, while low [beta] will indicate good performance in short runs. Interestingly large values of 1000/[alpha] are
associated with small values of [beta], and it seems likely that good systems can be identified from short runs. |
570
|
| |
Employing the Resolution Power of Search Keys Ari Pirkola and Kalervo Jarvelin Published online 19 March 2001
In any database some search terms
will more effectively discriminate between relevant and non-relevant documents for a given search. By resolution power Pirkola, and Jarvelin mean a term's ability relative to other terms in a search to increase query
performance. Utilizing the InQuery retrieval system which computes a probability value for a document for each term, a 515,825 document TREC subset and the 47 queries from topic set 101-150 that contained at least 5
substantive terms, they attempt to identify the good and bad terms, their statistical properties, and how such knowledge can improve effectiveness. All possible non empty combinations of the five terms for each query
were generated and utilized, resulting in 1457 searches which were then ranked by average precision. The order of terms in the best case was then used to rank the terms, with the lead term considered best, terms that
were in a combination that exceeded the best term searched alone were considered good, and others bad. For each term, document frequency, collection frequency, and within document frequency were calculated. Terms were
assumed to have high resolution power if collection frequency over document frequency exceeded some threshold and either within document frequency over document frequency was greater than the same measure for other
query terms times some constant or alternatively whose document frequency was less than or equal to that of any other term in the query divided by a constant.The frequency statistics of good and bad keys are quite
similar. High resolution power terms can be identified automatically, but ranking in between does not correspond well to after the fact analysis. Use of the best term in a structured query will improve performance. |
575
|
| |
Chinese Document Indexing Based on a New Partitioned Signature File: Model and Evaluation Wai Lam, Kam-Fai Wong, and Chi-Yin Wong
Published online 21 March 2001
Lam, et alia, suggest the partition of the signature file when that method is used to index documents in Chinese into segments based upon the number of characters in
each word. Since the number of Chinese words that exceed 5 characters in length is very small, this normally will partition the signature file into five segments. The result will be faster searching than with a
single signature file, but either increased overhead or increased false drops. The model provides a parameter which will allow the searcher to control this trade off. A hashing function is considered superior to a
pre-stored signature table in a dynamic environment although duplicate word signatures and thus false drops are sure to occur. A frequency based hashing function uses the frequency of each word in the corpus to rank
them in descending order. Words are then grouped by assigning the highest frequency ungrouped term to the group with the smallest sum of frequencies in order to balance the total group frequencies. The initial bits of
the hashed value indicate the group and the remainder are a simple hash of the word. This value will reduce collisions, signature file duplications, and false drops. Groups could also be created based upon the first
Chinese character of the word which would alleviate the need for the frequency processing. The partitioned signature file method and single signature file method were compared using the TREC-5 Chinese collection. A 70%
improvement in precision and a 50% reduction in search space was achieved. Both grouped hashing methods equaled table look-up performance. |
584 |
| |
|
|
BOOK REVIEWS |
| |
The Robot in the Garden: Telerobotics and Telepistemology in the Age of the Internet, edited by Ken Goldberg Alexander Halavais
Published online 2 March 2001 |
598
|
| |
After the Internet: Alien Intelligence, by James Martin Luca I. G. Toldo Published online 22 March 2001 |
599 |
| |
|
|
| |
LETTERS TO THE EDITOR |
601 |
| |
|
|
| |
CALL FOR PAPERS |
603 |
|