|
EDITORIAL |
| |
In This Issue Bert R. Boyce |
1195
|
RESEARCH |
| |
Recollections of Irving H. Sher 1924-1996: Polymath/Information Scientist Extraordinaire Eugene Garfield Published online 18 October 2001In this issue we begin with Garfield's
recollections of Irving H. Sher, in which he reviews the contributions of Sher, both to his personal work and to the Institute for Scientific Information. Sher bore great responsibility for the development of the
Permuterm Subject Index, and the Automated Subject Citation Alert System. He was the lead author on the first report of the citation characteristics of Nobel prize winners, and a key collaborator in the design of the
impact factor metric.
|
1197 |
| |
Known-Item Online Searches Employed by Scholars Using Surname plus First, or Last, or First and Last Title Words Frederick G. Kilgour Published online 16 October 2001Kilgore repeats an earlier experiment which
supported the efficacy of the two or three word search for known items, by now limiting his searches to the MARC 100 and 245 fields in the University of Michigan online catalog, and testing the surname and two title
word search's ability to produce a single 20 line minicat. Every book citation with a personal author found in the bibliographies of 10 scholarly monographs, each chosen from one of Dewey's ten classes, was searched by
surname plus first title word, surname plus last title word and surname plus both words. The surname plus both words produced a 20-line minicat in 98.9% of the cases, and in 54.8% of searches only the desired record was
displayed. Use of a third title word increased the percentage to 99,4%. In 13.1% of the cases the title was reported as not in the database. Of these 216 were verified as not in the database, 114 were citation errors
that were verified and researched, 40 could not be verified, and 81 were discarded after verification as having corporate authors. This argues for the combining of the found item display with an availability display.
|
1203 |
| |
An Interpretive and Situated Approach to an Evaluation of Perseus Digital Libraries Shu Ching Yang Published online 31 October 2001Yang studies Perseus, a hypermedia digital
library on classical studies as a learning tool for classroom instruction. Data was collected on the experiences of five undergraduates taking a course in classical Greek studies who used the system exclusively and
regularly in class and in their assignments. The class assignment and problem solving techniques using the system were discussed. As the students worked, their verbalized thought processes, decisions, and body language
were recorded. The multiplicity of available links lead some students to cognitive overload, while others complained of fragmented instruction. Several complained about the unevenness of the material and the limitations
of the system's path making tool.
|
1210 |
| |
Ranked Retrieval with Semantic Networks and Vector Spaces Vladimir A. Kulyukin and Amber Settle Published online 26 October 2001Kulyukin and Settle create a logical model that generalizes and rigorously formalizes
both the semantic network spreading activation model and the dot product version of the vector space model of retrieval systems, demonstrating that the two models are equivalent under ranked retrieval, by specifying
algorithms to construct each model from the other. This suggests that tests comparing the two models may be really comparing differences in data and relevance judgments.
|
1224
|
| |
Reduction of the Dimension of a Document Space Using the Fuzzified Output of a Kohonen Network Vicente P. Guerrero and Felix de Moya Anegon Published online 31 October 2001Guerrero and de
Moya Anegon extracted 7758 unique words from the abstracts of the last 954 records in the Summer 1996 issue of Library and Information Science Abstracts, reduced the set to 7577 with a stop list, and to 5052 using the
Porter Stemmer, then chose the 1200 with the greatest discrimination values. The terms were then weighted by the product of their occurrence in the document and the log of the total number of terms in the database over
the number of terms in the document. The set was further reduced to 400 dimensions by using only vectors for documents that had been assigned one of seven subject terms. The documents and the terms were then clustered
using a fuzzy value Kohonen Self-organizing Map technique and the clusters evaluated as to the grouping of LISA assigned terms. Degrees of membership for each item in each cluster are thus available.
|
1234
|
| |
The Scatter of Documents Over Databases in Different Subject Domains: How Many Databases Are Needed? William W. Hood and Concepcion S. Wilson Published online 26 October 2001To investigate cross
database subject scatter Hood and. Wilson created 14 queries of 240 characters or less in different topical areas that would retrieve fewer than 5000 records a year from DIALOG databases. The terms were searched in
title and abstract fields from 1994 to 1998. Files in the DIALINDEX category ALL excluding newspapers and ontap files as well as Current Contents Search were used providing 373 databases. Fourteen DIALINDEX searches
were then run and databases not supporting duplicate detection removed from the results before the remainder were sorted in decreasing frequency order. The top database was run against the query separately for each of
the five years and the ``remove duplicates'' command issued. Then the top and second database were searched together, duplicates removed, and the process repeated until a cumulative frequency distribution of the
non-duplicated records was created. Considerable scatter occurs and in the worst case the most productive database provides only 19% of the citations. The distributions are hyperbolic with identifiable
cores. However, the degree of scatter appears to be subject dependent with high concentration in one database and 80% coverage in five to eight files for four of the searches; moderate concentration in one database and
80% coverage in seven to ten files for five of the searches; and low concentration in one database and 16 to 19 files to get 80% coverage in the other four.
|
1242 |
| |
Effects of Link Annotations on Search Performance in Layered and Unlayered Hierarchically Organized Information Spaces Landon Fraser and Craig Locatis Published online 26 October 2001Fraser and Locatis investigate the effect
of adding annotations to hyperlinks both in shallow and deep link structures and the effect of annotations on searches with low word correspondence with the words in links, which they would term difficult searches.
Accuracy was reaching the correct document section; efficiency the number of clicks necessary to do so, as well as elapsed search time. A 25 page document with obscure subject matter was supplemented with three
additional documents to provide roughly equal treatment of four internal topics in 48 separate sub-sections. A three layered version was created, with a forth set of links to primary content. A un-layered version with
48 links from a table of contents with the same nested structure was also prepared, then both versions were supplemented with annotations to create two additional versions. Easy questions contained a word appearing in a
link; medium questions contained such a word or a synonym but appearing in multiple links; difficult questions had no direct word correspondence. High School students were given three consecutive questions and assigned
twenty to each of the four treatments. Link annotations had minimal effect on performance. Shallow spaces were easier and quicker to navigate. Attention seems to be paid to link wording rather than annotation wording.
|
1255
|
| |
The Self-Organization of the European Information Society: The Case of ``Biotechnology'' Loet Leydesdorff and Gaston Heimeriks Published online 26 October 2001Leyesdorff and Heimeriks
select 711 biotechnology papers supplying 787 institutional addresses in Europe, Japan, or the United States in order to determine if this literature can be used to illuminate the interaction between the reorganization
of the European science community through transnational collaboration and the transition to so called Mode-2 research with its emphasis on government, university and industry collaboration. The top 10% of title words
yields 245 unique types and nine documents that included none of these words were excluded yielding a 778 by 245 document term matrix. Factor analysis of the words provides 99 factors with an eigenvalue above unity and
thus reveals no inherent structure and leaves the choice of number of word groupings arbitrary. Discriminant analysis will successfully classify 77.6% of the papers geographically but this level is not statistically
significant. The word frequency lists for these geographical sets can be correlated with the forced factor loadings on ten or less dimensions. A three-factor solution which suggests a geographically separate literature,
makes the American set more correlated. A four-factor solution which would consider an international factor as well as the three geographical factors, shows us American and European correlations at almost equal strength
to different factors. The European set is correctly classified by discriminant analysis in 86.5% of cases. Its disaggregation shows differences and thus seems to be a part of a global phenomenon not national
interactions. The European vocabulary factor disappears after disaggeration..
|
1262 |
BRIEF COMMUNIATION |
| |
Natural Language Processing: Word Recognition without Segmentation
Khalid Saeed and Agnieszka Dardzinska Published online 25 October 2001To recognize Arabic script Saeed and Dardzinska bitmap a word without the dot
components, find the cusps of letters, calculate the length of the vectors from the origin to these points, and the differences in length between successive vectors as values in a matrix. They use the lowest eigenvalues
for a first feature vector and then calculate the angles of the original vectors and use the difference in their successive tangents to create a second matrix whose lowest eigenvalues will provide a second feature
vector. These shapes will provide an initial classification. The width of the linear segments between the cusps, and the number and position of dots is also calculated and this information used if the initial shapes
have not defined an existing word. The process was successfully tested on several script fonts. |
1275
|
| |
CALLS FOR PAPERS |
1280 |
|