ASIST AM 04 START ConferenceManager    


Evaluation of Thesauri for automatic query expansion and searching within document structure

P. Bryan Heidorn, Jing Zhang and Hongyan Sun

Presented at ASIST 2004 Annual Meeting; "Managing and Enhancing Information: Cultures and Conflicts" (ASIST AM 04), Providence, Rhode Island, November 13 - 18, 2004


Abstract

Scientific documentation is being moved to electronic format to increase accessibility and to reduce distribution costs. As in other areas of electronic publishing and scholarly access to material, this migration is initially a replication of the structure and access of the paper documents on the web. Few projects take advantage of the added functionality that could be afforded by electronic access. In this paper we explore the effectiveness of two types of enhanced access to scientific documents: automatic query expansion through a semantically encoded thesaurus and document subsection searching through XML indexing. The study described in this paper included 60 Subjects who performed 1189 queries to find a botanical description that would allow them to positively identify a plant. This study and the following analysis shed light on some of the requirements for future implementation of these types of access. Statistical analysis of the query-document relationship and statistical measures of the thesaurus-document relationship can inform thesaurus improvement. An automatically calculated weighting mechanism can be incorporated to obtain optimal usage of the thesauri. Another finding was that when using full text searching in semi-structured documents it is not sufficient to give users search access at the paragraph level of detail. Statistical term analysis demonstrated that this finding is attributable to the distribution of terms in the documents. To increase retrieval effectiveness it is necessary in these scientific documents to mark the topicality of individual sentences of the documents.


  
START Conference Manager (V2.47.4)
Maintainer: rrgerber@softconf.com