ASIST AM 04 START ConferenceManager    

Designing and Developing an Automatic Interactive Keyphrase Extraction System with Unified Modeling Language (UML)

Min Song Il-Yeol Song Xiaohua Hu

Presented at ASIST 2004 Annual Meeting; "Managing and Enhancing Information: Cultures and Conflicts" (ASIST AM 04), Providence, Rhode Island, November 13 - 18, 2004


Designing and developing a system that assists the users in digesting and understanding information available have been a difficult challenge. In this paper, our aim is to design and develop an automatic interactive keyphrase extraction system, called KPSpotter, capable of processing various formats of data such as XML, HTML and plain text through Internet. KPSpotter is built on combining Information Gain data mining measure and several Natural Language Processing techniques such as Part of Speech (POS) technique and First Occurrence of Term. To improve extraction accuracy, WordNet is incorporated into KPSpotter. In designing and developing KPSpotter we utilize Unified Modeling Language (UML). UML modeling helps in the formalization of the preliminary analysis model and accomplishes iterative system design and development. We also conduct experiments for system performance testing by comparing keyphrases extracted by KPSPotter and KEA, a well-known na´ve Baysiean based keyphrase extraction system. The experiments show that KPSpotter outperforms KEA in most test cases.

START Conference Manager (V2.47.4)