|
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within
them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. In this
course, we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used in these systems. Systems to be presented are are:
- KDT - Knowledge Discovery in Texts
- FACT - discovers associations amongst keywords labeling the items in a collection of textual
documents
- Document Explorer - provides high level language for interactive exploration of textual collections
We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. We will give special emphasis to incremental algorithms and to efficient data
structures. The course will cover the state of the art in this rapidly growing area of research. |