Special Section

From "Storage and Retrieval Systems" to "Search Engines": Text Retrieval in Evolution

by Irene Travis

Text retrieval was once a specialized niche in computing; however, in the last decade it has become a common, mainstream application. Concurrently, we have moved from stand-alone packages that primarily processed bibliographic data contained within the system to text processing packages designed to process external text files and be integrated with other applications, such as imaging or database management systems. This transition has been reflected in a change in terminology from "storage and retrieval systems" to "text search engines."

A number of factors have pushed these developments. The tremendous decrease in the cost of storage and computing has clearly been the major enabling change. However, another critical component has been the widespread availability of text generated in electronic form, which has reduced the cost of making electronic text available for search. Finally, the ability to distribute text applications to a wide audience cheaply and quickly has had a major impact, particularly in the last few years. The World Wide Web, graphical interfaces and many other standards and improvement to communications and the interoperability of systems and software have supported the exponential growth of text retrieval as an application.

This increase in commercial growth and investment has led to rapid diversification in the marketplace in the last 10 to 15 years. As Trudi Bellardo Hahn discusses in her article, Text Retrieval Online: Historical Perspective on Web Search Engines, much of the research on which new systems are based was conducted as early as the 1960s.

The ability of developers to test their systems on large, i.e., multi-gigabyte, databases has also spurred improvements and fine-tuning in performance, as Donna Harman of the National Institute of Science and Technology discusses in The Text REtrieval Conferences (TRECs): Providing a Test-Bed for Information Retrieval Systems.

The variety of packages has increased, with probabilistic algorithms and linguistic enhancements being marketed instead of or in addition to the traditional Boolean search capabilities. Elizabeth Liddy discusses progress in the application of natural language processing (NLP) to text retrieval in her article, Enhanced Text Retrieval Using Natural Language Processing.

It has also become more likely that the text retrieval system will be a part of some other application - such as document management or groupware. Indeed, many organizations now acquire text retrieval capability as part of such packages. Modularization of functions and standardization of interfaces has led to the "search engine" now being an independent component that can be used to retrieve text from many different kinds of files in different locations. Or, from an alternative perspective, particular files may be searched by the same or different users using completely different text retrieval systems.

One aspect of this change is that many interfaces for text retrieval systems are customized for particular applications. At the same time, among users there is a growing contingent of "casual users" - not just "end-users," but users who access the system infrequently. There are still hard and poorly resolved trade-offs between ease of use and power of computing systems. In our last article, Interface Design Concepts in the Development of a Web-Based Information Retrieval System, Rebecca Dunning, Marie Shuttleworth and Phil Smith discuss principles of good interface design for IR applications.

We hope to discuss other aspects of these developments in future issues.

Irene Travis
Bulletin of the American Society for Information Science