JASIS Home PageJASIS Table of Contents

Journal of the Association for Information Science

 

Bert R. Boyce
We begin this issue with four diverse papers on clustering as a retrieval
method and end with three even more diverse papers on user study.

585

RESEARCH

 

Order-Theoretical Ranking
Claudio Carpineto and Giovanni Romano

First we have Carpineto and Romano, who make use of a clustered document file based upon set inclusion relations among terms, merge queries into the clustered document space and consider the shortest path between a query and document as the basis of a retrieval status value. Typical hierarchical clustering methods do not produce all likely clusters due to arbitrary tie breaking, and fail to discriminate between documents with significantly different degrees of similarity to a query. In their concept lattice ranking (CLR), a lattice is built on the basis of term co-occurrence in documents and supplemented rather than totally re-computed with the addition of each new document or query.

Using the CACM and CISI collections and queries, weighted term vectors were computed to be used in best match retrieval, and a hierarchical single link clustering using cosign ranking, for comparison with CLR. Lattice construction took 15 minutes for CACM and 2 hours for CISI. Both best match and CLR return better precision and recall measures than hierarchical clustering, but little difference appears between the two. A comparison of CLR and hierarchical clustering on unmatched documents was then carried out using expected search length as a measure. CLR outperforms and may be useful in discovering non-matching relevant documents.

A Linear Algebra Measure of Cluster Quality
Laura A. Mather

Mather proposes a new measure of cluster effectiveness independent of knowledge of retrieval measures computed for queries on the clustered file, and based on the theory that the clustering quality of a term document matrix is determined by the disjointedness of the terms across the clusters. The ideal clustering case is that where terms which occur in one cluster occur only in that cluster, or, that is to say, are mutually exclusive across clusters. Such clusters occur if and only if the matrix is ``block diagonal,'' that is to say, has rows and columns that can be permuted to produce a matrix that has some set of blocks on the diagonal of the matrix that contain nonzero elements, while the remainder contain zero elements. The singular values of each of the blocks of a block diagonal matrix are the same as the singular values of a block diagonal matrix when terms are disjoint and as the structure diverges from block diagonal the two sets of singular values diverge as more term intersection occurs. A measure of the distance between the singular values of the term document matrix and the cluster matrices indicates cluster value, but is difficult to interpret. By taking random permutations of the matrix and creating clusters one can approximate the mean and standard deviation and by subtracting the mean from the actual observed clustering and dividing by the standard deviation of the samples, one can produce the number of standard deviations from a random clustering for the observation. These values can be compared to indicate the best clustering. The computation of the singular values of many large matrices is required and would be expensive. Experimentally the metric correlates significantly with Shaw's F and with the precision measure, increasing as these measures increase.

A Unified Mathematical Definition of Classical Information Retrieval  
Sandor Dominich

Dominich reviews the basic retrieval models concentrating upon the vector space and probabilistic representations. He shows that these retrieval models define systems of vicinities of documents around queries which can both be represented by a similarity space and thus have a unified mathematical definition.

Validating a Geographical Image Retrieval System
Bin Zhu and Hsinchun Chen

Zhu and Chen compare the performance of their Geographical Knowledge Representation System with image retrieval by human subjects. Gabor filters are used to extract low level features from 1282 pixel tiles cut from aerial photograph images. A 60 feature vector describes each tile and a Euclidean distance similarity measure is used to sort the tile images by least distance. Adjacent similar tiles are grouped to create regions which in turn are represented with derived vectors. Kohonen's Self Organizing Map (SOM) is created showing tiles representing the textures to be found in the data. Clicking on these displays the tiles in the same category.

Thirty human subjects were assigned an image and six randomly selected reference tiles to score for similarity to each of the 192 tiles in the image. A second group of ten subjects were asked to draw lines around areas they found similar to the reference tiles. A third group of ten subjects were given the SOM selected reference tiles and asked to categorize each tile in the whole image into categories represented by these reference tiles. The system exhibited no significant difference in precision from the human subjects but preformed less well on recall. Humans selected more tiles viewed as similar and the top 5 system and subject tiles were consistently different. Both had difficulty with tiles where texture alone did not distinguish one from another. In tile groupings into regions, humans out preformed the system on both measures but in image categorization no significant difference existed. Adding features other than texture may help performance which is close to inexpert human performance.

How Can We Investigate Citation Behavior? A Study of Reasons for Citing
Literature in Communication
  
Donald O. Case and Georgeann M. Higgins

Case and Higgins review the previous studies providing lists of reasons for author's citing behavior, and studies using these categories where investigators classify citation behavior on the basis of content analysis. They also reexamine the smaller set of studies involving surveys of authors as to the reasons for their behavior. Choosing the two most highly cited authors appearing in both of two recent studies of the Communication literature all citations to their work in the years 1995 and 1996 were collected. 133 unique citers were identified and sent 32 item questionnaires with the questions from a recent study in the Psychology literature. Returns from 56 were received, 31 for author A and 25 for author B, and responses for the two authors were not significantly different. No new reasons for citation were identified. The top reasons were a review of past work, acting as a representative of a genre of studies, and as a source of a method. Negative citation is quite rare. Twenty five not redundant items with some indication of importance were subjected to a factor analysis. Seven factors explain 69% of the variance; classic citation, social reasons, negative citation, creative citation, contrasting citation, similarity citation, and cite of a review. Factors predicting citation are; perception of novelty and representation of a genre, perception that citation will promote cognitive authority of the citing work, and perception that the cited item deserves criticism.

Children's Use of the Yahooligans! Web Search Engine: I. Cognitive,
Physical, and Affective Behaviors on Fact-Based Search Tasks
  
Dania Bilal

In the Bilal study twenty two middle school students were assigned a question to search in Yahooligans! as part of their Science class. The teacher provided ratings of the children's topic knowledge, general science knowledge, and reading ability. A quiz administered to the students indicated knowledge of the Internet and of Yahooligans! in particular. Lotus ScreenCam was used to record 18 of the student system interactions. Student's transcribed moves were classified and counted with a score of one (relevant) for selection of a link that appears appropriate and leads to the desired information; .05 for the selection of a link that appear appropriate but is not successful, and 0 to the selection of links that give no indication of information leading to success. Weighted effectiveness and efficiency scores are then computed. 

Thirty six percent initially browsed subject categories while the rest entered single or multi-word concepts. Key words and in some cases natural language were used in subsequent moves despite the fact that Yahooligans! does not support natural language search. Subsequent activity mixed browsing with term search. Looping and backtracking were very common but the go button using the search history links was unused. Most children scrolled but not often the complete page. Half were successful but all were inefficient.

Ethnomethodologically Informed Ethnography and Information System Design  
Andy Crabtree, David M. Nichols, Jon O'Brien, Mark Rouncefield, and Michael
B. Twidale 

Crabtree et al. object to traditional ethnographic analysis as applied to information problems on the basis that the application of pre-defined rules and procedures yields an organization of the activity observed from the point of view of the analyst rather than that of the participants. Such a ``constructive analysis'' approach does not describe the actual activities, but in the name of objectivity imposes a structure which obscures the real world practices through which subjects make sense of their surroundings, and produce information.

Ethnomethodology emphasizes rigorous thick description of local practices by assembling concrete cases of preformed activity as the direct units of analysis. EM analysis attempts to generate a description in great detail of how the described activity could be reproduced in and through the same practices. Such description provides a sense of the real world aspects of a socially organized setting to systems designers and thus provides the exceptions, contradictions, and contingencies of the activities that otherwise might not be evident. Practitioners of ethnography and computer system design have quite different cultures but communication can lead to far better design practices. .

587

 

 

 

 

 

 

 

 

 

 

602

 

 

 

 

 

 

 

 

 


 

 

 

614

 

 

 

625

 

 

 

 

 

 

 

 

 

 

 


635

 

 

 

 

 

 

 

 

 

 


646

 

 

 

 

 

 

 

 

 

 


666

 

 

 BOOK REVEIWS

 

Annual Review of Information Science and Technology, Vol. 33, 1998, by Martha E. Williams
Birger Hjorland

IT Investment in Developing Countries: An Assessment and Practical Guideline, by Sam Lubbe
Queen Esther Booker

Information Brokering, by Florence M. Mason and Chris Dobson
James J. Sempsey

Information Management for the Intellegent Organization: The Art of
Scanning the Environment, by Chun Wei Choo 
Donald R. Smith

CALL FOR PAPERS

683

 

685

 

686


687

 

688

 NOTE TO READERS

 

The JASIS home page <http://www.asis.org/Publications/JASIS/tocs.html> contains the Table of Contents and brief abstracts as above from January 1993 (Volume 44) to date.

The John Wiley Interscience site <http://www.interscience.wiley.com> includes issues from 1986 (Volume 37) to date.  Guests have access only to tables of contents and abstracts.  Registered users of the interscience site have access to the full text of these issues and to preprints. 

JASIS is now available online to all current ASIS Student  individual Members that have selected this option.  Please click here for  access instructions!

 

asisnavbarASIS HomeSearch ASISMake A Comment

© 2000 , Association for Information Science