The Text REtrieval Conference (TREC) workshop series encourages research in information retrieval from large text applications by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. Now in its seventh year, the conference has become the major experimental effort in the field. Participants in the TREC conferences have examined a wide variety of retrieval techniques, including methods using automatic thesauri, sophisticated term weighting, natural language techniques, relevance feedback and advanced pattern matching. The TREC conference series is co-sponsored by the National Institute of Standards and Technology (NIST) and the Information Technology Office of the Defense Advanced Research Projects Agency (DARPA).
In early 1992, the 25 adventurous research groups participating in TREC-1 undertook to scale their prototype retrieval systems from searching two megabytes of text to searching two gigabytes of text. Large disk drives were scarce in 1992, typical research computers were much slower then and most groups made Herculean efforts to finish the task. The conference itself was enlivened by people telling all the stories that happened along the way. But a truly momentous event had occurred: it had been shown that the statistical methods used by these various groups were capable of handling operational amounts of text and that research on these large test collections could lead to new insights in text retrieval.
Since then there have been five more TREC conferences, co-sponsored by NIST and DARPA, with the latest one (TREC-6) taking place in November of 1997. The number of participating systems has grown from 25 in TREC-1 to 51 in TREC-6, including participants from 12 different countries, 21 companies and most of the universities doing research in text retrieval. The diversity of the participating groups has ensured that TREC represents many different approaches to text retrieval, while the emphasis on individual experiments evaluated in a common setting has proven to be a major strength of TREC.
All of the TREC conferences have centered around two main tasks based on traditional information retrieval modes: a "routing" task and an "ad hoc" task. In the routing task it is assumed that the same questions are always being asked, but that new data is being searched. This task is similar to that done by news clipping services or by library profiling systems. In the ad hoc task, it is assumed that new questions are being asked against a static set of data. This task is similar to how a researcher might use a library, where the collection is known but the questions likely to be asked are unknown.
In TREC the routing task is accomplished by using known topics with known "right answers" (relevant documents) for those topics, but then using new data for testing. The topics consist of natural language text describing a user's information need (see The Test Collections below for a sample topic). The participants use the training data to produce the "best" set of queries (the actual input to the retrieval system), and these queries are then tested using new data.
The ad hoc task is represented by using known documents, but then creating new topics for testing. For both the ad hoc and routing tasks the participating groups run 50 test topics against the test documents and turn in the top-ranked 1000 documents for each topic. These results are then evaluated at NIST, with appropriate performance measures (mainly recall and precision) being used for comparison of system results.
The documents in the current test collections were selected from 11 different sources: the Wall Street Journal, AP newswires, articles from Computer Select disks (Ziff-Davis Publishing), the Federal Register, short abstracts from DOE publications, the San Jose Mercury News, U.S. Patents, Financial Times, the Congressional Record, the Los Angeles Times and the Foreign Broadcast Information Service. There are currently five CD-ROMs with approximately one gigabyte of text per disk with only two of these used for each TREC, i.e., only two gigabytes of data has generally been used in the testing.
The topics used in TREC have consistently been the most difficult part of the test collection to control. In designing the TREC task, there was a conscious decision made to provide "user need" statements rather than more traditional queries. Starting in TREC-3, different lengths (and component parts) of topics were used in each TREC to explore the effects of topic length, such as the use of short titles vs. sentence length descriptions vs. full user narratives.
The following is one of the topics used in TREC-6.
Groups can participate in some or all of the tracks, in addition to running the two main tasks. Almost all the tracks had at least 10 participating groups, with new groups joining TREC to specifically tackle some of the tracks.
The impact of TREC on text retrieval can be seen in three separate areas:
The system results in TREC itself show both a steady progression to more complex retrieval techniques and the resulting higher performance. Existing research groups (such as the Cornell SMART system) report a doubling in performance over the six years of TREC, whereas systems new to TREC typically double their performance in the first year as they move their techniques into current state-of-the-art. The conference itself encourages transfer of new methods into many different types of basic search techniques. For example, in TREC-2 the OKAPI system from City University, London, introduced some new term weighting methods. By TREC-4 these methods had been picked up by several groups, including the INQUERY system and a modified version of the Cornell SMART system. These groups in turn added to the methodology and by TREC-6 most of the other groups had incorporated these superior weighting techniques into their own systems.
The introduction of the tracks has led to research in new areas of text retrieval. The Chinese track and the earlier Spanish track were the first (large-scale) formal testing of retrieval systems in languages other than English. The Spoken Document track has joined the speech recognition community to the text retrieval community. The Cross-Language track, just started in TREC-6, exploits the current high interest in cross-language retrieval and serves as a testing platform both in the United States and Europe.
TREC continues to be successful in advancing the state of the art in text retrieval, providing a forum for cross-system evaluation using common data and evaluation methods and acting as a focal point for discussion of methodological questions on how retrieval research evaluation should be conducted. TREC-7 is currently underway!!
This site also contains online versions of the proceedings from past conferences and pointers to sources of hard-copy versions of the same.