ASIS&T 2005 START ConferenceManager    

MARRT: Using Induced Knowledge Base to Automatically Mark up Plant Taxonomic Descriptions with XML

hong cui

Sparking Synergies: Bringing Research and Practice Together @ ASIST '05 (ASIS&T 2005)
Westin Charlotte, Charlotte, North Carolina, October 28 - November 2, 2005


Abstract

Despite the sub-language nature of taxonomic descriptions of plants, researchers warned about the existence of large variations among different description collections in terms of information contents and representations. These variations impose a serious challenge to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a new approach to automatic mark up different collections of taxonomic descriptions with XML. The effectiveness of the approach is demonstrated with experiments using three contemporary floras. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned association rules representing certain types of domain knowledge and conventions. Experiments show that our simple and efficient machine learning algorithms outperform some general-purpose algorithms across different floras. More importantly, the domain knowledge learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples. The system design and the evaluation of markup algorithms are reported in this paper. The effectiveness of induced domain knowledge on the improvement of markup performance will be reported in a separate paper. In this paper, common practices of flora authors and the potential of MARTT system to improve the efficiency and effectiveness of the creation, organization, and utilization of plant descriptions is discussed.


  
START Conference Manager (V2.49.6)
Maintainer: rrgerber@softconf.com