ASIST AM 03 2003 START ConferenceManager    

An Approach to Protein Name Extraction using Heuristics and a Dictionary

Kazuhiro Seki, Javed Mostafa

Presented at ASIST 2003 Annual Meeting -- Humanizing Information Technology: From Ideas to Bits and Back (ASIST AM 03 2003), Westin Long Beach, Long Beach, California, October 20 - 23, 2003


This paper proposes a method for protein name extraction from biological texts. Our method exploits hand-crafted rules based on heuristics and a set of protein names (dictionary). In contrast to previously proposed methods, our approach avoids the use of natural language processing tools such as part-of-speech taggers and syntactic parsers so as to improve processing speed. We implemented a prototype system for protein name extraction based on our method and conducted evaluation experiments. The result showed that our system produces results comparable to the state-of-the-art protein name extraction system on multiple corpora.

