|
| |
In This Issue |
1-2
|
In this issue Bert Boyce Published Online: 3 Dec 2002 |
| |
Research Article |
3-15 |
Matchsimile: a Flexible Approximate Matching Tool for Searching Proper Names Gonzalo Navarro, Ricardo Baeza-Yates, Jo o Marcelo Azevedo Arcoverde Published Online: 27 Nov 2002In this issue we begin with a description by Navarro et al. of
Matchsimile, a software application designed to search simultaneously for thousands of distinct personal and corporate names in text where authority control cannot be assumed and duplication, abbreviation, omission,
insertion, and transposition are present. It was designed for the Portuguese language, and utilizes Brazilian cultural rules for name formation. The user is permitted to establish weights for various transformations
needed to transform the text word to the pattern word and threshold numbers for acceptance. Matchsimile normalizes text words, recognizes their similarity with target words stored in a trie structure, links
them to the patterns containing those words, and finally recognizes phrase patterns. The cost of processing M patterns on N megabytes of text was N (2.05 + 5.96 M0.6 + 0.01 M) on a Sun UltraSparc-1 with
Solaris 2.5.1.
|
16-28 |
Browsing-based Conceptual Information Retrieval Incorporating Dictionary Term Relations, Keyword Association, and a User's Interest
Makoto Nakashima, Keizo Sato, Yanhua Qu, Tetsuro Ito Published Online: 27 Nov 2002Nakashima et al. consider that an initial search engine
query on the World Wide Web creates a personal digital library containing the desired documents, which must then be browsed by the user for a precision search. The traditional top-down browse may not be the most
effective strategy,given that the query terms on which the engine's ranking algorithm is based may not adequately reflect the user' desires. They use standard controlled vocabularies to replace and expand query terms,
but also create a local dictionary as the personal digital library is browsed by storing documents judged relevant and not relevant in separate sets and using their terminology to suggest the most likely remaining
documents to view. The standard thesauri terms replace any word they include, and closeness for document ranking is measured by finding the most specific generic term for both document and query terms. The initial
display organization is maintained, but when documents have been judged relevant or not, those yet-to-be-judged documents with several terms in common with documents judged relevant are considered to be worth examining,
and suggested to the user. Experiments on the CACM, MED, and CISI collections using the ACM Classification, MeSH, and the ASIS Thesaurus were preformed using the provided relevance judgements with ranking
compared to a ranking by cosine function. Both the Global dictionary expansion and the local dictionary feedback system were statistically superior in average precision for 10 recall points. The combination of the two
techniques improved performance over cosine ranking by 66% in CACM, 34% in MED, and 33% in CISI.
|
29-38 |
Scholarly Use of the Web: What Are the Key Inducers of Links to Journal Web Sites? Liwen Vaughan, Mike Thelwall Published Online: 27 Nov 2002Vaughan and Thelwall find that both the age of a journal's Web site and the extent of its provided
content are positively correlated to the ratio of the journal's site's link counts to the journal's impact factor. Age was determined by use of the WayBack Machine, an interface to a database of The Internet Archive
that will report the date of each time a site has been crawled by a search engine. Because it is possible for sites to deny access to crawlers, URL's change, and many sites will not be crawled immediately after their
initiation, if at all; the method is imperfect, but powerful. Alta Vista was used for link counts during low use periods. These counts will vary based on how busy the servers are, because they are based upon an estimate
from a sample count whose size is dependent upon available resources. Content extent was categorized as either: basic journal description, access to table of contents, access to abstracts, or full-text access. Data was
collected on 38 library and information science journals and 88 law journals. To control for the effect of journal impact factors on link counts, the link/impact factor ratio was used rather than simple link counts. The
Kruskal-Wallis test gave significant differences among the content groups, and age correlated with linkage using the Spearman coefficient in both disciplines.
|
39-46 |
Using the User's Mental Model to Guide the Integration of Information Space into Information Need Charles Cole, John E. Leide Published Online: 27 Nov 2002Cole and Leide consider whether visualization will assist undergraduate students in the success of
their search for information for essay writing. To determine if undergraduates could produce a mental representation of interconcept relations between concepts related to their topics, 33 students were interviewed prior
to presenting their topic to their instructor. Two groups were randomly chosen: one, the precision group, asked to supply four or five key words or phrases outlining their topic; the second, or recall-visualization
group, was asked to take a further step and convert these terms into a visualization by converting them into circles where size represents importance, and closeness of the circles indicates closeness of concept
association. The student then joined the authors for an automated search of history databases, with the recall group finding a desired citation and then adding terms to expand to about 200 citations. The precision group
did the same but expanded by staying close to the original search terms to generate about 10 citations. The high recall group received an objective visualization, based on term counts and the author's view
of relationships, and were asked to compare it to their own. Using t-tests, the marks received by the students on their essays were not statistically different among the two groups and a third that was graded but did
not participate in the study. Thus, there is no evidence to support visualization as a technique for improving essay grades. However, the high recall nature of the visualization searches did not adversely affect grading.
|
47-67 |
a Bit More to It: Scholarly Communication Forums as Socio-technical Interaction Networks Rob Kling, Geoffrey McKim, Adam King Published Online: 27 Nov 2002Kling, McKim, and King focus on what they term Scholarly Communication Forums (SCF), entities
that permit communication among scholars, and which have been proliferating, particularly in the medium of computer-mediated exchange of information (e-SCF). They present a Socio-Technical Interaction Network (STIN)
model of e-SCFs that fits better than what they term the standard model. The STIN model includes people, equipment, data, resources, documents, legal mechanisms, and resource flows with their social, economic, and
political interactions. The standard model holds an actor's behavior is motivated by the information processing features of an e-SCF, and that such actor individually chooses to use or not use the system. Thus, the
focus is on the individual and the features of the technology and not the characteristics of the groups and organizations involved. A STIN modeler will identify a population of interactors, core interactor groups,
incentives, undesired interactions, existing communication forums, points of architectural choice, and resource flows prior to determining a configuration. Both standard and STIN modeling techniques are applied to
arXiv.org/SPIRES- HEP, FlyBase, ISWORLD, and HEP to illustrate the method.
|
68-80
|
Rotation and Scale Invariant Wavelet Feature for Content-based Texture Image Retrieval Moon-Chuen Lee, Chi-Man Pun Published Online: 27 Nov 2002Lee and Pun want to do content based image retrieval that will be sensitive to changes in image scale
and orientation. They convert the image into a log-polar image that does not vary under rotation and is nearly invariant under scale change. The transform is of O(n) complexity, where n is the number of pixels. The
process causes an undesirable row shift effect, but a wavelet packet transform, which is row-shift invariant, will resolve this difficulty. However, the number of wavelet coefficients produced is large, and can be
reduced by computing an energy signature for each subband of wavelet coefficients, sorting them and choosing only the most dominant. The sort complexity will be O(n log n), the computation of the signatures O(n), and
thus over all O(n log n). Using 25 natural textures from the Brodatz texture album, Euclidian distance is computed between the query image and the database image after the above process is completed.
The system outperforms the traditional wavelet packet signature feature.
|
81-86
|
Information Science Research Agenda in Slovakia: History and Emerging Vision Jela Steinerov Published Online: 27 Nov 2002Steinerova provides a summary of library and information science education and research in Slovakia since 1990. New
curricula have been developed and the European Credit Transfer Scheme adopted. A terminologic and encyclopedic dictionary of library and information science has been produced. Research interests are moving beyond
automation toward social and human contexts.
|
| |
Brief Communication |
87-90
|
Empirical Evidence of Self-organization? Peter van den Besselaar
Published Online: 27 Nov 2002Finally, Van den Besselaar comments on Leydesdorff and Heimeriks' JASIST article that found a relationship between words in titles and the region of origin of documents by
discriminant analysis using title word sets to predict the region of production in 78% of the cases in a Biotechnology literature. Van den Besselaar repeats the method with an information science and a science and
technology literature finding an even higher percentage of correct classification. He then points out that the data used do not meet the conditions for discriminant analysis in that the independent variables are nominal
rather than on an interval scale, which has the effect, because of the near uniqueness of the words, of making it trivial to find a relationship with any classification. Indeed, random groupings of the Biotechnology
documents can equally well be predicted. Testing random splits of the database yields strong prediction for the first half, but less than the a priori probabilities on the second half, implying every test results in a
different model and the relation of region to word use needs to be rejected.
|
| |
Book Review |
91-92
|
Book review Lisa A. Ennis Published Online: 3 Dec 2002Historical
Information Science: An Emerging Unidiscipline. Lawrence J. McCrank. Medford, NJ: Information Today, 2001; 1192 pp. Price $149.95 (ISBN: 1-57387-071-0) |
| |
Letter to the Editor |
93 |
Reaction to a book review Ron Day Published Online: 27 Nov 2002 |
|