Journal of the Association for Information Science and Technology

Index
Table of Contents

Volume 54  Issue 2


 

In This Issue

95-96


 

In this issue
Bert Boyce
Published Online:
3 Jan 2003
 

 

Research Article

97-114

 

 

 

 

 

 

 

Informational Environments: Organizational Contexts of Online Information Use
Roberta Lamb, John Leslie King, Rob Kling
Published Online: 6 Dec 2002

In this issue we begin with Lamb, King and Kling who are interested in the effect of the industry environment on information gathering practices, particularly those involving information and communication technologies like online searching.  They studied use of online services in 26 widely differing California firms operating in law, real estate, or biotechnology over a 17 month period.  Data was gathered through semi-structured on-site interviews. Five influences on online usage were identified: interaction with regulatory agencies; demonstration of competence to clients; client expectations for timely, cost effective information; the possibility of shifting information responsibilities outside the organization; and the existence of industry wide infrastructures as information sources.  The institutional and technical environment of a firm consistently circumscribes the domain in which choices of online resources are made by its employees. Firms the operate in highly technical and institutional environments have more incentive to gather information than do those in low tech unregulated industries.
 

115-123

 

 


 

 

 

 

NLPIR: a Theoretical Framework for Applying Natural Language Processing to Information Retrieval
Lina Zhou, Dongsong Zhang
Published Online: 6 Dec 2002

Zhou and  Zhang believe that for the potential of natural language processing NLP to be reached in information retrieval a framework for guiding the effort should be in place.  They provide a graphic model that identifies different levels of natural language processing effort during the query, document matching process. A direct matching approach uses little NLP, an expansion approach with thesauri, little more, but an  extraction approach will often use a variety of NLP techniques, as well as statistical methods. A transformation approach which creates intermediate representations of documents and queries is a step higher in NLP usage, and a uniform approach, which relies on a body of knowledge beyond that of the documents and queries to provide inference and sense making prior to matching would require a maximum NPL effort. 
 

124-139

 


 

 

 

 

 

Investigating How Individuals Conceptually and Physically Structure File Folders for Electronic Bookmarks: the Example of the Financial Services Industry
Lisa Gottlieb, Juris Dilevko
Published Online: 6 Dec 2002

Gottlieb and Dilevko asked 15 graduate accounting and business management students to create electronic folders for 60 URLs and file bookmarks in the folders. Questionnaires tailored to each individual's work were then administered to determine the rationale for the decisions made both on folder scheme design and URL assignment. Any category, whether or not it contained URLs, or only sub-categories was counted as a folder. On the average 15.73 folders were created with a range from 6 to 31. Eight of the participants utilized superordinate folders containing primary folders with URLs. The average number of primary folders was 12.93 with a range of 6 to 24.  Seven participants created sub-folders, i.e. folders containing both URLs and other primary folders. Subject categories differed widely as did assignment criteria. It appears that intended use and relevance to current projects was a strong influence.
 

140-151

 

 

 

 

 

 

 

 

Geosearcher: Location-based Ranking of Search Engine Results
Carolyn Watters, Ghada Amoudi
Published Online: 6 Dec 2002

Waters and Amoudi describe GeoSearcher, a prototype ranking program that arranges search engine results along a geo-spatial dimension without the provision of geo-spatial meta-tags or the use of geo-spatial feature extraction. GeoSearcher uses URL analysis, IptoLLWhois, and the Getty Thesaurus of Geographic Names to determine site location.  It accepts the first 200 sites returned by a search engine, identifies the coordinates, calculates their distance from a reference point and ranks in ascending order by this value.  For any retrieved site the system checks if it has already been located in the current session, then sends the domain name to Whois to generate a return of a two letter country code and an area code. With no success the name is stripped one level and resent. If this fails the top level domain is tested for being a country code. Any remaining unmatched names go to IptoLL. Distance is calculated using the center point of the geographic area and a provided reference location. A test run on a set of 100 URLs from a search was successful in locating 90 sites. Eighty three pages could be manually found and 68 had sufficient information to verify location determination. Of these 65 ( 95%) had been assigned reasonably correct geographic locations. A random set of URLs used instead of a search result, yielded 80% success.
 

152-160

 

 

 

 

 

 

 

 

 

 

Order-based Fitness Functions for Genetic Algorithms Applied to Relevance Feedback
Cristina Lpez-Pujalte, Vicente P. Guerrero-Bote, Flix de Moya-Anegn
Published Online: 9 Dec 2002

Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions.

The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.
 

161-168

 

 

 


 

 

 

 

 

Predicting Information Flows in Network Traffic
Melvin J. Hinich, Robert E. Molyneux
Published Online: 9 Dec 2002

Hinich and Molyneux review the literature of internet measurement and note three results consistently to be found in network traffic studies.  These are "self-similarity," "long-range dependence," by which is meant that events in one time are correlated with events in a previous time and remain so through longer time periods than expected,  and "heavy tails" by which they mean many small connections with low byte counts and a few long connections with large byte counts. The literature also suggests that conventional time series analysis is not helpful for network analysis. Using a single day's traffic at the Berkeley National Labs web server, cumulated TCP flows were collected, log transforms were used with the adding of .01 to all values  allowing log transforms of the zero values, and providing a distribution that overcomes the heavy  tail problem. However, Hinich's bicorrelation test for nonlinearity using overlapping moving windows found strong evidence of nonlinear structures.  Time series analysis assumes linear systems theory and thus additivity and scalability. Spectral analysis should provide large peaks at the lowest frequencies if long range dependence is present since the power spectrum would go to infinity if the frequency goes to zero. This does not occur and so long range dependence must be questioned, at least until it is determined what effect other  OSI layers may have on the TCP data.
 

169-178

 

 

 

 

 

 

Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming
Gregory B. Newby, Jane Greenberg, Paul Jones
Published Online: 10 Dec 2002

Newby, Greenberg, and  Jones analyze programming productivity of open source software by counting registered developers contributions found in the Linux Software Map and in Scourceforge.  Using seven years of data from a subset of the Linux directory tree LSM data provided 4503 files with 3341 unique author names. The distribution follows Lotka's Law with an exponent of 2.82 as verified by the Kolmolgorov-Smirnov one sample goodness of fit test.  Scourceforge data is broken into developers and administrators, but when both were used as authors the Lotka distribution exponent of 2.55 produces the lowest error. This would not be significant by the K-S test but the 3.54% maximum error would indicate a fit and calls into question the appropriateness of K-S for large populations of authors.
 

 

Brief Communication

179-181

 

 

 

 

 

 

a Profile of Faculty Reading and Information-use Behaviors on the Cusp of the Electronic Age
Helen Belefant-Miller, Donald W. King
Published Online: 10 Dec 2002

 Finally Belefant-Miller and King analyze the demographic portion of a survey of faculty and staff at the University of Tennessee to determine reading and information use behavior. Faculty each read an average 384 documents per year for their work including an average 161 journal articles. They funded 84% of their own subscriptions, and averaged 4.2 subscriptions per person. Personal computer access was available to 91.5% and 95% made some use of it. About half access e-mail more than once a day spending an average 24 minutes a day. Browsing remains a very important means of document discovery despite the use of universal bibliographic databases. Paper remains the preferred reading interface, with electronic reading about one quarter of paper readings. Self reported publication rates were 3 journal articles per year.
 

 

Book Reviews

182-184

 

 

Kasper Graarup
Published Online: 3 Jan 2003

_
Knowledge and Knowing in Library and Information Science: A Philosophical Framework_. John M. Budd. Lanham, MD: Scarecrow; 2001; 361 pp. Price: $38.50. (ISBN: 0-8108-4025-1.)
 

184-185

 

 

Ina Fourie
Published Online: 6 Dec 2002

_Information Management for the Intelligent Organization: the Art of Scanning the Environment_, 3rd edition. Chun wei Choo. Medford, NJ: Information Today; 2002; 325 pp. Price $39.50 (ISBN 1-57387-125-7.)


ASIST Home Page

Association for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright © 2003, Association for Information Science and Technology