SOASIST ...on the move

Southern Ohio Chapter of the Association for Information Science & Technology

[Skip navigation]


Mark Wasson, LexisNexis

On "Data Mining and Text-based Information"

WHEN: August 27, 2002 (Tuesday), 6 PM-- Dinner and 7PM-- Speaker in Auditorium 3 (Three)

WHERE: LexisNexis, Dayton, OH

COST: Presentation & dinner-- $10.00 for members and non-members; $5 for student or retired members. PREPAYMENT REQUIRED TO GUARANTEE CATERER.

COST: Presentation only-- free but must register to guarantee seating.


"Metadata" is defined as data about data. For a text document, there are at least two types of metadata. Structural or formatting metadata describes a document's layout on the page, and can include information on fonts, spacing, indentation and so on. Content-based metadata captures information found in a document and organizes it for further use. This may include controlled vocabulary index terms, extracted terms and summaries that have been created and assigned to the document. Extracting and highlighting a list of proper names or citations found in a document is another example source of content-based metadata.

Using such metadata in combination with or as an alternative to full-text search can help people find and retrieve relevant documents more easily by both simplifying the search and improving the preciseness of what the search is specifying. Picklist-based document retrieval based on controlled vocabulary index terms is one way of exploiting content-based metadata.

By examining metadata from across a collection of documents, and combining it with data from other sources, such as stock price, corporate financial and economic data, one can begin to discover information that is not available in any one document. Knowledge Discovery in Databases, a.k.a. Data Mining, is a newer area in Artificial Intelligence that has had much early success when dealing with numerical and other structured data, in areas like consumer behavior and purchasing analysis, fraud detection and business forecasting.
Free text, however, does not have the structure that many knowledge discovery processes need. Metadata provides a means for representing the information found in free text in a structured way that is appropriate for many knowledge discovery processes. Applying data mining techniques to text is a steadily growing focus area within the knowledge discovery domain.

In his talk, Mark will give a general overview of knowledge discovery and data mining, discuss how this technology can be applied to text, review some applications and related technology, and provide links to resources for more information.

Mark Wasson is a Senior Architect/Research Scientist who has been with LexisNexis since 1986. He led the research projects behind Term-based Topic Identification, the Term Mapping System, the NEXIS Company Indexing and NetOwl Indexing technologies behind SmartIndexing, Searchable LEAD and the Fact Extraction Tool Kit. He also conceived Company Dossiers and Trend Analysis. He collaborated with researchers at the University of Pennsylvania on two projects. His current research activities and interests include applying knowledge discovery and data mining technologies to text-based content, question answering technology and automatic summarization. Mark also scouts new and emerging technologies at numerous conferences, workshops and third party technology companies (including more than 100 companies in 2001 alone).

Mark has authored or co-authored a number of papers and presentations for technical conferences covering topics including document categorization and indexing, summarization, information extraction, knowledge discovery, shallow vs. deep text processing approaches and academic-industry relations. He has also served on two panel discussions (including one at 2001 ASIS&T) and three conference and workshop program committees. Mark received a Bachelor of Science degree in Computer Science and both Bachelor of Arts and Master of Arts degrees in Linguistics, all at the University of Iowa.

DIRECTIONS to LexisNexis from Cincinnati: Take I-75 North to Dayton. Exit 44 to S.R. 725 (Centerville/Miamisburg Rd.). East on S.R. 725. South on S.R. 741 (Springboro Pike). LexisNexis is approx. 1 mile, on the right side of the road. Turn right at Spring Valley Road entrance (the 6th light from S.R. 725/S.R. 741 intersection), LexisNexis sign will say 9443-9595. Drive underneath skyway connecting the buildings. Turn right, go over the speed bump, and park in the lot next to the covered entrance. Enter Building 4 (9443 Springboro Pike). Wait at the guard station for someone to escort you.

DIRECTIONS to LexisNexis from Columbus: West on I-70. I-675 towards Cincinnati. Exit 2 (Centerville/Miamisburg Exit) off of I-675. Left on Yankee Rd. Right on Lyons Rd. Left on S.R. 741 (Springboro Pike). LexisNexis is about .5 mile on the right. Turn right at Spring Valley Road entrance (the 2nd light from Lyons Rd./S.R. 741 intersection), LexisNexis sign will say 9443-9595. Drive underneath skyway connecting the buildings. Turn right, go over the speed bump, and park in the lot next to the covered entrance. Enter Building 4 (9443 Springboro Pike) and wait for an escort.


Optional dinner: hot buffet of baked chicken with fine herb sauce, braised beef tips burgundy, vegetarian lasagna, wild rice pilaf, green beans almondine, tossed salad, pasta salad, assorted fresh dinner rolls, dessert choices, coffee, and iced tea.

PREPAYMENT REQUIRED with check made out to "SOASIS" by 5 p.m., Friday, 08/23/2002, sent to Patricia Carter, B6F1 room 82, LexisNexis, 9595 Springboro Pike, Miamisburg, OH 45342. Please indicate (1) association affiliation noting student/retired as appropriate, and (2) the name of your employer. Questions may be addressed; (937) 865-6800 x6099.

Beverages and a portion of dinner cost are being underwritten by LexisNexis.