|
EDITORIAL |
| |
In This Issue Bert R. Boyce |
605
|
RESEARCH |
| |
Mooers' Law: In and out of Context Brice Austin Published online 26 April 2001 In this issue we begin with ``Mooers' Law: In and Out of Context.''
Brice points out that Moores meant that having information was not always considered a good thing by a user since it required the expenditure of effort to make use of it, not that a system might not be used because the
use itself was an expenditure of extra effort. While this may be a principle of retrieval usage it is not that stated by Moores. This leads to a suggestion that system use depends upon the user's environmental level of
desire for information; If high any IR system will be used, if low, no IR. |
607
|
| |
Author Inflation Leads to a Breakdown of Lotka's Law Hildrun Kretschmer and Ronald Rousseau Published online 27 April 2001
Fractional counting of
authors of multi-authored papers has been shown to lead to a breakdown of Lotka's Law despite its robust character under most circumstances. Kretschmer and Rousseau use the normal count method of full credit for each
author on two five-year bibliographies from each of 13 Dutch physics institutes where high co-authorship is a common occurrence. Kolmogorov-Smirnov tests were preformed to see if the Lotka distribution fit the data. All
bibliographies up to 40 authors fit acceptably; no bibliography with a paper with over 100 authors fits the distribution. The underlying traditional "success breeds success" mechanism assumes new items on a
one by one basis, but Egghe's generalized model would still account for the process. It seems unlikely that Lotka's Law will hold in a high co-authorship environment. |
610
|
| |
Visualization of Term Discrimination Analysis Jin Zhang and Dietmar Wolfram Published online 26 April 2001
Zang and Wolfram compute the
discrimination value for terms as the difference between the centroid value of all terms in the corpus and that value without the term in question, and suggest selection be made by comparing density changes with a
visualization tool. The Distance Angle Retrieval Environment (DARE) visually projects a document or term space by presenting distance similarity on the X axis and angular similarity on the Y axis. Thus a document icon
appearing close to the X axis would be relevant to reference points in terms of a distance similarity measure, while those close to the Y axis are relevant to reference points in terms of an angle based measure. Using
450 Associated Press news reports indexed by 44 distinct terms, the removal of the term ``Yeltsin'' causes the cluster to fall on the Y axis indicating a good discriminator. For an angular measure, cosine say, movement
along the X axis to the left will signal good discrimination, as movement to the right will signal poor discrimination. A term density space could also be used. Most terms are shown to be indifferent discriminators.
Different measures result in different choices as good and poor discriminators, as does the use of a term space rather than a document space. The visualization approach is clearly feasible, and provides some additional
insights not found in the computation of a discrimination value. |
615
|
| |
Scholarly Use of Internet-Based Electronic Resources Yin Zhang Published online 11 April 2001 By Internet resources Zhang means any electronic file
accessible by any Internet protocol. Their usage is determined by an examination of the citations to such sources in a nine-year sample of four print and four electronic LIS journals, by a survey of editors of these
journals, and by a survey of scholars with "in press" papers in these journals. Citations were gathered from Social Science Citation Index and manually classed as e-sources by the format used. All authors with
"in press" papers were asked about their use and opinion of Internet sources and for any suggestions for improvement. Use of electronic sources is heavy and access is very high. Access and ability explain most
usage while satisfaction was not significant. Citation of e-journals increases over the eight years. Authors report under citation of e-journals in favor of print equivalents. Traditional reasons are given for citing
and not citing, but additional reasons are also present for e-journals. |
628
|
FEATURES |
| |
Real-Time Adaptive Feature and Document Learning for Web Search Zhixiang Chen, Xiannong Meng, Richard H. Fowler, and Binhai Zhu
Published online 27 April 2001 Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles
from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on
retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space
model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are
greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will
generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches. |
655 |
| |
An Empirical Comparison of Visualization Tools to Assist Information Retrieval on the Web Misook Heo and Stephen C. Hirtle
Published online 26 April 2001
The reader of a hypertext document in a web environment, if maximum use of the document is to be obtained, must visualize the overall structure of the paths through
the document as well as the document space. Graphic visualization displays of this space, produced to assist in navigation, are classified into four groups, and Heo and Hirtle compare three of these classes as to their
effectiveness. Distortion displays expand regions of interest while relatively diminishing the detail of the remaining regions. This technique will show both local detail and global structure. Zoom techniques use a
series of increasingly focused displays of smaller and smaller areas, and can reduce cogitative overload, but do not provide an easy movement to other parts of the total space. Expanding outline displays use a tree
structure to allow movement through a hierarchy of documents, but if the organization has a wide horizontal structure, or is not particularly hierarchical in nature such display can break down. Three dimensional
layouts, which are not evaluated here, place objects by location in three space, providing more information and freedom. However, the space must be represented in two dimensions resulting in difficulty in visually
judging depth, size and positioning. Ten students were assigned to each of eight groups composed of viewers of the three techniques and an unassisted control group using either a large (583 selected pages) or a
small (50 selected pages) web space. Sets of 10 questions, which were designed to elicit the use of a visualization tool, were provided for each space. Accuracy and time spent were extracted from a log file. Users views
were also surveyed after completion. ANOVA shows significant differences in accuracy and time based upon the visualization tool in use. A Tukey test shows zoom accuracy to be significantly less than expanding outline
and zoom time to be significantly greater than both the outline and control groups. Size significantly affected accuracy and time, but had no interaction with tool type. While the expanding tool class out performed zoom
and distortion, its performance was not significantly different from the control group. |
666
|
| |
Use of Relevance Criteria across Stages of Document Evaluation: On the Complementarity of Experimental and Naturalistic Studies Rong Tang and Paul Solomon
Published online 26 April 2001 Tang and Solomon, based upon their review of the history of topical and other than topical criteria in relevance evaluation, decide to look at a two stage model where
judgements are first made on surrogate records and then on full document text to determine if a criteria shift takes place and if so in what manner and to what degree. Both a controlled experiment and a naturalistic
study were used to study the staging of relevance judgement criteria. In the controlled environment 90 undergraduate Psychology students were instructed to choose papers that would help them meet an assignment from 20
preselected papers on broader topic that included that assigned. They first selected on the basis of citation and abstract, then read the papers, and in each process filled out a questionnaire on the importance of each
of 15 criteria at each stage of the two-stage process. In the naturalistic study 9 Ph.D. Psychology students conducted literature searches to support their own research and were asked to think aloud while making their
decisions from retrieved surrogates, and later filled out a questionnaire while reading those materials that they selected and then interviewed at the end of the process. Apparently understandability is important at
both stages. Importance increased at stage two. Cognitive criteria do not all follow the same pattern across stages. The controlled group thought quality of information was most important in stage one and topicality
most important in stage 2. In the naturalistic study topicality was most frequent for stage one and research structure for stage two. A classification of criteria by their functionality is suggested as a better
approach. First a division as to whether a criterion is objectively associated with the document as opposed to being subjectively associated with a person's expectations; then a division based on primary (essential)
or secondary (for assistance) status. . |
676
|
| |
Multimedia Exploratory Data Analysis for Geospatial Data Mining: The Case for Augmented Seriation Myke Gluck Published online 1 May 2001 To prevent
type-one error, statisticians tend to accept the possibility of type-two error, which leads to the rejection of hypotheses later shown to be true. In both Exploratory Data Analysis and data mining the emphasis is more
appropriately on the elimination of type-two error. Thus EDA methods, including its visualization tools may be appropriate for Data Mining. Seriation, creates a matrix of observations and variables, where the cells
contain an icon whose size represents its value, and permits the movement of rows and columns in order to visually discern patterns. Augmented Seriation, a method of data mining, adds computer graphics, sound, color,
and extra dimensions to the matrix so that the analyst has different modalities for pattern observation. Gluck has developed software for such analysis. |
686 |
| |
|
|
| |
LETTERS TO THE EDITOR |
697 |
| |
|
|
|