Journal of the Association for Information Science and Technology

Index
Table of Contents

Volume 55  Issue 11


 

Editorial

937

 

In This Issue
Bert R. Boyce
 

 

Research

939

 

 

 

 

 

 

 

 

A Multidimensional Approach to the Study of Human-Information Interaction: A Case Study of Collaborative Information Retrieval
Raya Fidel, Annelise Mark Pejtersen, Bryan Cleal, and Harry Bruce

In this issue  Fidel, et alia, use Cognitive Work Analysis to study collaborative information behavior in the context of the work activities within which it occurs, the existing organizational relationships, and any workplace imposed constraints, as well as the actor's priorities and cognitive and social values. Each of these dimensions consist of a set of identified attributes. Motivations are also considered since information seeking is viewed as a goal directed activity. In the case study reported here, collaborative IR meant involvement of colleagues in the same work process to carry out any activity used to acquire information they did not possess. The data was collected by interview, observation and the use of documentary evidence and focused upon a member of a Microsoft design team who needed information for a decision and two collaborating colleagues.  Factors from multiple dimensions are shown to motivate the actor to seek collaborative IR. Such factors may not be clearly independent in their effects. However, it appears from this study that collaborative IR may arise when actors are in unfamiliar situations, need informal feedback and access to tacit knowledge; when pressed for time and in need of diverse sources not easy to understand and subject to diverse interpretation; and when information required is not documented but will significantly affect a teams efforts.
 

954

 

 

 

 

Rough Set Approach for Attribute Reduction and Rule Generation: A Case of Patients With Suspected Breast Cancer
Aboul-Ella Hassanien

Hassanien uses rough set analysis to reduce the number of attributes in a file of records to the minimal associated with a class label. Classification rules are then generated from this subset, and applied to the file. A test with a ten attribute file of cancer patient data from 360 patients attempted to deduce a benign or malignant condition. When compared to a ID3 decision tree method the rough set method  resulted in 60% reduction in number of rules and 95% classification accuracy as compared to 85%.
 

963

 

 

 

 

 

 

 

 

The Influence of Relevance Levels on the Effectiveness of Interactive Information Retrieval
Pertti Vakkari and Eero Sormunen

Vakkari and Sormunen are interested in the effect of the assignment of various levels of relevance to documents on query expansion effectiveness, as well as the effect of TREC's liberal definition of relevance. Using the Okapi system, 26 City University, London students searched TREC topics 388, 403, 427, and 442, on a subset of the TREC records from the LA Times and the Financial Times. Half used a complete top 15 term expansion from indicated relevant documents, and half were permitted to delete from the expansion list. All TREC relevant documents and a 5% sample of non-relevant documents were reassessed using a four level scale; highly relevant, fairly relevant, marginally relevant, and only TREC relevant. Users had difficulty in identifying relevant documents and thus feedback was generated from non-relevant documents. Users identified 45% of TREC level relevant retrievals but 85% of retrievals judged highly relevant. Both user's ability to identify relevant documents and use of levels of relevance affect the research result. Users selected documents as relevant 46% of which were not relevant by official judgement, and they did not recognize as relevant over one third of the retrieved and officially relevant documents. This lack of recognition led to a statistical difference in the performance of the two expansion methods.
 

970

 

 

 


 

 

 

 

 

 

 

 

The Influence of Document Presentation Order and Number of Documents Judged on Users' Judgments of Relevance
Mu-hsuan Huang and Hui-yu Wang

Huang and Wang revisit the order effect on relevance judgements in retrieval presentation lists with a focus on relationship to display set size. Using a convenience sample of 48 subjects experienced in online search, and the randomly ordered top 80 documents from a search in the LISA CD-Rom database, two judgement sessions using a seven point scale of relevance were run two months apart. In the first session each document was judged by 24 participants with the subset of the 80 documents chosen randomly for each subject. Judgement and position had a slight but not significant positive correlation.  The documents were then ranked by the means of their assigned relevance scores, and separated into five layers of 16 documents each. Documents randomly and proportionally selected from the layers created simulated retrieved lists of 5, 15, 30, 45, 60, and 75 members each ranked both low to high, and high to low. Six participant sets each received 2 of the 12 lists so created for phase 2 judgements. When five documents are presented no order effect is evident and in the low to high list the lowest is overestimated, while in the high to low list the highest is under estimated. With a 15 document retrieval the high to low rating mean is lower that of random order and the low to high is slightly higher while MANOVA shows a significant difference between these scores, but not between them and the random order. An order effect is present. At a retrieval size of  30 the difference is significant between both ordered and the random ratings and judgement is significantly affected by order. At a retrieval size of 45 the ratings are very close and reflect an order effect while not reaching significance. At 60 documents both ranked ratings are lower than random ratings and an order effect, while present,  is neither distinct nor significant.   At 75 retrieved both ranked ratings are higher than random and no order effect can be demonstrated. The low end shows no effect due to scant numbers; the high end lack of effect may be due to an exhaustion factor.
 

980

 


 

 

 

 

 

Evolution, Continuity, and Disappearance of Documents on a Specific Topic on the Web: A Longitudinal Study of "Informetrics"
Judit Bar-Ilan and Bluma C. Peritz

Bar-Ilan and Peritz searched for web pages on Informetrics in 1998, 99, 02, and 2003, in order to study the growth of the web literature on this topic, as well as its tendencies toward item modification, disappearance, and resurfacing. The original search was carried out on the six largest search engines in 1998 and a union list of 886 URLs was searched and the results examined. In 1999 1297 were found by the search and these did not include all of the previous results. However, a direct search of the missing URLs located 219 relevant pages missed by the 1999 engine search. In the 2002 and 2003 searches a new set of top five search engines was used, and additional formats beyond html and txt began to appear. In 2002, 3746 traditional pages appeared with 329 other formats and in 2003, 4389 traditional pages and 766 in other formats. In the five and a half year period the topic grew six fold while the web grew about ten fold. Of the 5034 pages that satisfied the search through 2002 only 3144 were available in 2003. Not only did 40% disappear but about half the remaining were modified
 

991

 


 

 

 

 

The University–Industry Knowledge Relationship: Analyzing Patents and the Science Base of Technologies
Loet Leydesdorff

Leydesdorff uses a Visual Basic routine to look at patents in 2002 with an address that contains the root "univ*" (3291) and at patents with a Dutch address (2827) in the USPTO database http://www.uspto.gov. in order to study the relationship of patents to their literature knowledge base.  The title words and the title words in documents cited are then analyzed using the Pajek software for visualization of asymmetrical matrices, and using the cosine similarity measure rather than Pearson for normalization. The relationship between scientific literature and patents is specific to sectors, existing in the bio-medical sector but less apparent elsewhere. The bio-medical sector also demonstrates university-industry relationships but generalizing to other disciplines is questionable.
 

1002

 

 

 

 

 

 

 

Using N-Grams for Arabic Text Searching
Suleiman H. Mustafa and Qasem A. Al-Radaideh

Mustafa and Al-Radaideh investigate the effectiveness of the N-gram conflation technique on Arabic text with its complex prefix, infix, and suffix structure which makes affix removal stemming difficult. Using 6000 distinct words extracted from their own papers, di-grams and tri-grams were constructed. Fifty words were then chosen as queries and the Dice similarity measure computed between both the di-gram and tri-gram representations of each query word and each remaining text word. Each query word then formed a ranked list of possible variants where the measure exceeded six tenths (the level at which the plotted recall and precision curves for various thresholds cross) and these were checked to determine if they were actual variants. Actual listed variants over listed variants provided a precision measure and after evaluating the remaining file, actual listed variants over all evaluated variants, a recall measure. The measures vary inversely as the threshold is modified. Chi-square was then used to determine that performance differences using the average of the measures for tri-grams and for di-grams were not significant, although di-grams provide better results. The error rate is over 30% bringing the usefulness of the technique in Arabic text into question.  
 

1008

 

 

 

 

 

 


 

Finding Governmental Statistical Data on the Web: A Study of Categorically Organized Links for the FedStats Topics Page
Irina Ceaparu and Ben Shneiderman

Ceaparu and  Shneiderman summerize three studies of alternate organization concepts for the FedStats portal which was designed to provide a single point of access to multiple federal agencies providing statistical data. Three questions; one broad and loosely defined, one specific, and one requiring a comparison were run against FedStat's original alphabetical list of links, against a categorical grouping of these links, and against a categorical listing that links to the providing agency's site rather than directly to the information. A different group of fifteen graduate students searched the three questions in each structure, participating in a think aloud protocol, and completing a post search questionnaire on their opinions as to satisfaction and ease of use. Correct answers, as judged by the experimenter, were at 15.6% in the first study, 24.4% in the second, and 42.2% in the third. Judgement of the site as useful increased from 35% to 47% and then to 69%. Perception of ease of use increased form 42%, to 56%, and finally to 73%. The design principles of universal usability, easy navigation, common language, the availability of comparative search and an advanced search facility, and granularity of data as to time and geography, are seen as important for statistical data.
 

 

Brief Communication

1016

 

 

 

 

Open Knowledge Management: Lessons From the Open Source Revolution
Yukika Awazu and Kevin C. Desouza

Awazu and  Desouza examine open source communities to derive insights on augmentation of knowledge management projects. Open source material is available to all, re-distributable, non-discriminatory, and modifiable. Knowledge, when viewed as an organizational resource, typically does not have these characteristics but is rather protected as a scarce commodity. Since 80% of knowledge is contributed by 20% of the employees there exists a large free rider problem which open source communities attack by giving high status to contributors. The open knowledge agenda modeled on open source communities has great potential.
 

1020

 

 

 

 

Spelling and Grammar Checking Using the Web as a Text Repository
Kai A. Olsen and James G. Williams

Olsen and Williams are interested in selecting the proper preposition, as would be chosen by a native speaker,  for a prepositional phrase where multiple possibilities exist. By presenting different phrase possibilities to a search engine one can use the hit count as a vote for the most likely phrasing, and examples show the efficacy of the procedure. It is suggested that a look at the text surrounding a noun phrase will indicate the proper verb for use with that phrase, and that the method has implications for grammar, style and spell checking.
 

 

Book Reviews

1024

 

Virtual Inequality: Beyond the Digital Divide
Wallace Koehler
 

1025

 

Net Effects: How Librarians Can Manage the Unintended Consequences of the Internet
Denise E. Agosto
 

1026

 

JSTOR: A History
Stephen Ferguson
 

 

Call for Papers

1028

Second Symposium on Intelligence and Security Informatics (ISI-2004)


ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright © 2004, Association for Information Science and Technology