 |
| |
In This Issue |
471 |
In this issue Bert Boyce |
| |
Research Article |
473 |
Web Search Strategies and Approaches to Studying Nigel Ford, David Miller, and Nicola Moss Published online 25 February 2003In this issue Ford, Miller and Moss utilize 68 volunteers from a population of
250 Master's students to complete on the web three search tasks with clear fact based goals and three or less facets. One task required broadening the search concepts from those given, a second provided a specific
terminology for one facet but required a second facet that would require translation, and the third required general to specific transformation. The students were measured as to their performance on Entewistle's Revised
Inventory of Approaches to Studying providing values for ten study variables and asked to assess their experience on the Internet, with Alta Vista, and with Boolean search. Searches were conducted on Alta Vista using
Netscape Navigator 4 with participants free to choose and switch Boolean, best match or combined search modes at will while a front end script recorded all submitted searches and help access. Search related variables
extracted were from Boolean only queries, best match only queries, and combined queries. Factor analyses were conducted on all variables for each search mode for each search. In task one Boolean is differentiated
from best match search by sharing high loads on active interest, intention to reproduce, fear of failure, and relating ideas. The combined searcher is linked with the best match searcher with low active interest, low
intention to reproduce and low fear of failure. In task 2 Boolean is differentiated from best match search by sharing high loads on intention to reproduce and low on intention to understand. Best match loads
positively with intention to understand and negatively with intention to reproduce. Combined searching linked with both good and with poor time management. In task 3 the loads mimic task 1. It seems Boolean
is consistently linked to a reproductive rather than a meaning seeking approach, but also with high levels of interest and fear of failure. Best match associates with the converse of these measures.
|
489
|
Three Target Document Range Metrics for University Web Sites Mike Thewall and David Wilkinson Published online 25 February 2003Thelwall and Wilkinson use crawls of university web sites in the UK, Australia, and New Zealand to
generate all links targeted at same country university web sites which they then use to create a graph structure for study. Using Broder's study as a model they identify a strongly connected component, SCC, where one
could start anywhere in the set and reach every other page, and an Out component whose pages can be reached from all strongly connected pages but provide no link back to that set. The other components in the
Broder model are not accessible except with access to a major search engine database. In link and out link counts for all three university systems in both the Out and SCC components when graphed logarithmically display
the linear nature which would indicate that power laws, and a success breeds success phenomena, are generally in effect. However, automatically generated pages, non-HTML web pages, and large resource-driven sites all
were associated with anomalies in this observation.
|
497 |
Searching for Images The Analysis of Users' Queries for Image Retrieval in American History Youngok Choi and Edie M. Rasmussen Published online 25 February 2003Choi and Rasmussen collect queries to the Library of Congress's American Memory
photo archive from 48 scholars in American History by way of interviews and pre and post search questionnaires. Their interest is in the types of information need common in the visual domain, and the categories of terms
most often used or indicated as appropriate for the description of image contents. Each search resulted in the provision of 20 items for evaluation by the searcher. Terms in queries and acceptable retrievals were
categorized by a who, what, when, where faceted classification and queries into four needs categories; specific, general, abstract, and subjective. Two out of three analysts assigned all 38 requests into the same one of
the four categories and in 19 cases all three agreed. General/nameable needs accounted for 60.5%, specific needs 26.3%, 7.9% for general/abstract, and 5.3% for subjective needs. The facet analysis indicated most content
was of the form person/thing or event/condition limited by geography or time.
|
511 |
Information as Commodity and Economic Sector Its Emergence in the Discourse of Industrial Classification Cheryl Knott Malone and Fernando Elichirigoity Published online 25 February 2003Malone and Elichirigoity review the concept of "information"
as it exists in the 1997 implemented North American Industry Classification System (NAICS), the current scheme for the organization of governmental data about the economies of the U.S., Canada, and Mexico. The term
represents one of 20 major economic sectors based upon processes of production and upon which data may be reported. It also represents a measurable commodity based upon the concept of copyright. A review of the
background studies and reports which document the development of NAICS shows the desire for a single underlying principle, similarity of production processes rather than a marketing approach, and the construction of the
information sector within the context of globalization and the internet. The three nations agreed in 1996 that the information sector should consist of industries engaged in the "transformation of information into
a commodity that is produced, manipulated and distributed...," or as the NAICS manual states, industries that "primarily create and disseminate a product subject to copyright." However, industries that
transfer or transport such products are also included which seems inconsistent with the production principle. In 2002 the category was modified to separate internet publishing and broadcasting from these subcategories
and to create an internet services category.
|
511
|
A Method for the Comparative Analysis of Concentration of Author Productivity, Giving Consideration to the Effect of Sample Size
Dependency of Statistical Measures Fuyuki Yoshikane, Kyo Kageura, and Keita Tsuji Published online 25 February 2003
Studies of the concentration of author productivity based upon counts of papers by individual authors will produce measures that change systematically with sample size. Yoshikane, Kageura, and Tsuji seek a
statistical framework which will avoid this scale effect problem. Using the number of authors in a field as an absolute concentration measure, and Gini's index as a relative concentration measure, they describe
four literatures form both viewpoints with measures insensitive to one another. Both measures will increase with sample size. They then plot profiles of the two measures on the basis of a Monte-Carlo simulation of
1000 trials for 20 equally spaced intervals and compare the characteristics of the literatures. Using data from conferences hosted by four academic societies between 1992 and 1997, they find a coefficient of loss
exceeding 0.15 indicating measures will depend highly on sample size. The simulation shows that a larger sample size leads to lower absolute concentration and higher relative concentration. Comparisons made at the same
sample size present quite different results than the original data and allow direct comparison of population characteristics. |
528
|
Incorporating User Search Behavior into Relevance Feedback Ian Ruthven, Mounia Lalmas, and Keith van Rijsbergen Published online 25 February 2003Ruthvewn, Mounia, and van Rijsbergen rank and select terms for query
expansion using information gathered on searcher evaluation behavior. Using the TREC Financial Times and Los Angeles Times collections and search topics from TREC-6 placed in simulated work situations, six student
subjects each preformed three searches on an experimental system and three on a control system with instructions to search by natural language expression in any way they found comfortable. Searching was analyzed for
behavior differences between experimental and control situations, and for effectiveness and perceptions. In three experiments paired t-tests were the analysis tool with controls being a no relevance feedback system, a
standard ranking for automatic expansion system, and a standard ranking for interactive expansion while the experimental systems based ranking upon user information on temporal relevance and partial
relevance. Two further experiments compare using user behavior (number assessed relevant and similarity of relevant documents) to choose a query expansion technique against a non-selective technique and finally
the effect of providing the user with knowledge of the process. When partial relevance data and time of assessment data are incorporated in term ranking more relevant documents were recovered in fewer iterations,
however retrieval effectiveness overall was not improved. The subjects, none-the-less, rated the suggested terms as more useful and used them more heavily. Explanations of what the feedback techniques were doing led to
higher use of the techniques.
|
549 |
Requirements for a Cocitation Similarity Measure, with Special Reference to Pearson's Correlation Coefficient Per Ahlgren, Bo Jarneving, and Ronald Rousseau Published online 25 February 2003Ahlgren, Jarneving, and. Rousseau review accepted procedures for
author co-citation analysis first pointing out that since in the raw data matrix the row and column values are identical i,e, the co-citation count of two authors, there is no clear choice for diagonal values. They
suggest the number of times an author has been co-cited with himself excluding self citation rather than the common treatment as zeros or as missing values. When the matrix is converted to a similarity matrix the normal
procedure is to create a matrix of Pearson's r coefficients between data vectors. Ranking by r and by co-citation frequency and by intuition can easily yield three different orders. It would seem necessary that
the adding of zeros to the matrix will not affect the value or the relative order of similarity measures but it is shown that this is not the case with Pearson's r. Using 913 bibliographic descriptions form the Web of
Science of articles form JASIS and Scientometrics, authors names were extracted, edited and 12 information retrieval authors and 12 bibliometric authors each from the top 100 most cited were selected. Co-citation and r
value (diagonal elements treated as missing) matrices were constructed, and then reconstructed in expanded form. Adding zeros can both change the r value and the ordering of the authors based upon that value. A
chi-squared distance measure would not violate these requirements, nor would the cosine coefficient. It is also argued that co-citation data is ordinal data since there is no assurance of an absolute zero number of
co-citations, and thus Pearson is not appropriate. The number of ties in co-citation data make the use of the Spearman rank order coefficient problematic.
|
569 |
Modeling the Information-Seeking Behavior of Social Scientists Ellis's Study Revisited Lokman I. Meho and Helen R. Tibbo Published online 25 February 2003Meho and Tibbo show that the Ellis model of information seeking applies to a
web environment by way of a replication of his study in this case using behavior of social science faculty studying stateless nations, a group diverse in skills, origins, and research specialities. Data were collected
by way of e-mail interviews. Material on stateless nations was limited to papers in English on social science topics published between 1998 and 2000. Of these 251 had 212 unique authors identified as
academic scholars and had sufficient information to provide e-mail addresses. Of the 139 whose addresses were located, 9 who were physically close were reserved for face to face interviews, and of the remainder 60
agreed to participate and responded to the 25 open ended question interview. Follow up questions generated a 75% response. Of the possible face to face interviews five agreed to participate and provided 26 thousand
words as opposed to 69 thousand by the 45 e-mail participants. The activities of the Ellis model are confirmed but four additional activities are also identified. These are accessing, i.e. finding the material
identified in indirect sources of information; networking, or the maintaining of close contacts with a wide range of colleagues and other human sources; verifying, i.e. checking the accuracy of new
information; and information managing, the filing and organizing of collected information. All activities are grouped into four stages searching, accessing, processing, and ending.
|
| |
Book Reviews |
487 |
Electronic Collection Department A Practical Guide, by Stuart D. Lee Reviewed by Marianne Afifi Published online 25 February 2003 |
588 |
Beyond Our Control? Confronting the Limits of Our Legal System in the Age of CyberSpace, by Stuart Biegel Reviewed by Kenneth Einar Himma Published online 25 February 2003 |
591 |
Economic Growth in the Information Age, by Dale W. Jorgensen Reviewed by John Cullen Published online 25 February 2003 |
|