Back to Dec/Jan 2000 Index

of The American Society for Information Science

Vol. 26, No. 2

December / January 2000

Go to
Bulletin Index

bookstore2Go to the ASIS Bookstore

Copies

Annual Meeting Coverage

Track 3:

Information Retrieval


by Matthew Koll

The Information Retrieval track at the 1999 ASIS Annual Meeting was informative and lively. It started with my overview of the field, intended to provide a backdrop against which attendees might view the papers and sessions to follow. In this report I'll try to recapture a sense of that backdrop and touch on some of the highlights of  the sessions.

Information Retrieval Backdrop

Information retrieval is the science and practice of trying to show people the document they would want to see next, if they had total knowledge and hindsight.

The field used to be divided between the information retrieval research community and the business world. In recent years we have seen a growing split (along with increased communication) among researchers who focus on users versus those who focus on retrieval systems. An even bigger schism has developed within the search industry, between the traditional, professional information services and the consumer-oriented search services. Yes, the Web has changed everything.

One way to grasp the wide scope of our field and some of the changes is to dig into the "needle in a haystack" metaphor. Searching is like finding a needle in a haystack, but not all searches are the same. "Finding a needle in a haystack" can mean

  • a known needle in a known haystack;
  • a known needle in an unknown haystack;
  • an unknown needle in an unknown haystack;
  • any needle in a haystack;
  • the sharpest needle in a haystack;
  • most of the sharpest needles in a haystack;
  • all the needles in a haystack;
  • affirmation of no needles in the haystack;
  • thinks like needles in any haystack;
  • let me know whenever a new needle shows up;
  • where are the haystacks?; and
  • needles, haystacks whatever.

The point is that people come to search systems with a variety of needs. Systems do pretty well finding a specific document in a specific collection. But often, users don't find what they want because they're looking in the wrong place. Also, users sometimes want to know that they have found all the relevant documents (high recall) or be confident that they have not missed any important documents. This task is difficult and tends to be neglected, in large part because people don't know what they're missing until and unless they find it.

The "needles, haystacks whatever" line started off as a light-hearted poke at Gen-X searchers, but with the massive growth in consumer online searching, this now represents a legitimate viewpoint. Casual searchers don't have time for a lot of interaction and aren't going to give the system a lot of words to work with; they want some good information back fast, and if they don't get it they're going to take their business elsewhere.

Despite recent progress, as seen in TREC results and in commercial systems, providing good search results continues to be a difficult problem. The main reasons are

  • language is inherently imprecise;
  • when users do use logic, they misuse and overuse it;
  • users provide very few explicit clues to what they want;
  • there is limited opportunity for interaction; users want to find what they're looking for and get on with their lives;
  • people don't know where to look;
  • many retrieval methods do not scale, especially to the very large collections now emerging;
  • the limits of aboutness; knowing the topic of a document is not sufficient to predict its relevance; and
  • "I'll know it when I see it." It's hard to describe what you don't know.

To a large degree, the papers and panel sessions addressed these issues in an engaging and constructive way. Here are some key questions I had coming into the conference and my answers as of today.

Q: As the Web gets bigger, and queries don't get longer fast enough, won't precision be terrible?

A: No, precision is the top priority of the research and commercial communities.

Q: Is recall dead?

A: No, but it is in need of attention.

Q: Will commercial imperatives kill off information science entirely?

A: No. Library and information science professionals have never been in higher demand.

Information Retrieval at ASIS '99: Themes and Observations

Classification of Tracks. The first observation I'd make about information retrieval at ASIS '99 was that it wasn't limited to the information retrieval track. Sessions dealing with searching, navigating, agents and visualization spread out across various tracks. Perhaps for future conferences we'd be better off not classifying sessions at all, but just letting the attendees do full text searching of the program.

The User, Time and the Search Process. There was an increased emphasis on the process of searching. This is manifest in papers such as those by Choo, Detlor and Turnbull, which examined the modes and stages users pass through in searching, and the Kantor, Boros, Melamed and Menkov paper describing Information Quests. Papers like these mark more than just a return of attention to the role of the user in the search process, but also reveal the impact of recent advances in technology in tracking what people actually do when searching. Kantor's Ant World project at Rutgers, which captures, organizes and finds other people's relevant information quests, is a fascinating way of moving search from a private activity to more of a community activity, where people can improve the effectiveness of their searches by learning from other people with similar questions. Similar, but slightly different, Lankes' work with the National Digital Reference System and AskA project are designed to help searchers find people who actually know the answers to their questions or who can guide them in their quest.

Larger Task Context. The Watson search agent, described by Budzik and Hammond, takes this renewed focus on the user even further. Watson tries to understand the context in which a user's need for information is arising and to anticipate or to at least augment search requests by utilizing knowledge of the user's larger task, of which searching is just a part. Budzik reported on an experiment in which Watson outperformed human experts.

Relevance . Projects that involve the user more deeply are a hopeful sign for search systems. Developers are striving to overcome the problem that the relevance decision (the process by which a user decides whether a document meets his needs or not) is driven by much more than what the document is about. Getting beyond aboutness is essential. Toward that end, Schamber and Bateman described an ongoing project to determine the factors that influence users' relevance decisions. They've identified factors such as novelty, availability and source characteristics. Maybe we can train search assistants to start keying into these variables as well as topicality.

Off-the-Page Indicators. Another trend in our field is the growing use of "off-the-page" indicators of relevance. These indicators include, for example,

  • relevance ratings provided by other people known as "collaborative filtering" to some, or as a variant of the time-honored "relevance feedback" to others;
  • popularity of items, as indicated by analyzing user clicks; and
  • analyzing the references made to a document by other documents, that is, the hypertext links and citations to a document, as well as the text surrounding those references in the citing documents.

Bradshaw and Hammond described the Rosetta system, which makes innovative use of the context of citing documents to describe the cited document. Their goal is worth repeating: "to provide precise results for simple queries." Given that most queries are very simple (still averaging just 2-4 words), this is an important goal. Several people expressed a concern that reliance on the judgments and links of others could lead to a loss of serendipity and individuality in search results.

Integration. Though not in the Information Retrieval track, Lawrence provided an overview of a search system developed at NEC that is notable for its inclusion of a wide array of search and relevance ranking techniques, including citation mining. The NEC system is an example of a trend toward the integration of multiple methods. Several authors discussed integrated combinations of searching and browsing. Another interesting integration involves combining catalogs (collections of items carefully selected, described and classified into a taxonomy) with large full-text collections such as the whole Web. I'll have to indulge in a mention of the new AOL search product here, which features just that kind of integration. A user's search runs against not just the human-created descriptions of documents and the taxonomy into which they were classified (based on the Netscape Open Directory), but also the full-text of those documents, as well as the collected aggregated searches conducted by other users. For display, users can easily navigate among the documents themselves, their descriptions and the taxonomy.

Multimedia. Several speakers described their efforts at multimedia search. This includes both describing and retrieving relevant graphic, audio and video materials, and using visual and audio tools in the user interaction. On the search side, one of the most enlightening sessions was the panel on Information Retrieval from Speech. Siegler addressed how speech recognition for the purpose of preparing for searchable access is different from speech recognition for the purpose of preparing a transcript. He noted that instead of only retaining the most likely interpretation of a speech fragment, his system kept several of the less likely but possible interpretations. By keeping these additional words and phrases as part of the document description, search results were improved substantially.

User Interface and Visualization. The self-anointed highlight of the conference was a panel chaired by Hildreth, and featuring Bates, Marchionini, Hjerppe and Rorvig. The panel explored issues ranging from personalization and user choices versus complexity to learning and social search contexts and inexorable technology trends. Rorvig described his "Big Sky" project which aims to visualize a huge collection all the information in the world on a huge surface, specifically, a planetarium. Hjerppe reminded us of some of the fundamental paradoxes underlying information retrieval, of which the most profound is probably that the user must describe that which he does not know in order to find it. Another significant point that arose during this and other sessions was the importance of getting new systems out into the real world for testing. This is a shift toward a product-design-with-rapid-iterations philosophy as opposed to the other extreme of relatively big science projects.

More. Of course, the underlying theme of the information retrieval track was "more" more content, more searches, more kinds of things to be found, and much more attention to our field.

Matthew Koll is an AOL Fellow at America Online. He can be reached by mail at America Online, CC2, 44900 Prentice Dr., Dulles, VA 21066. He can be reached by phone at 703/265-1766 or by e-mail at mkoll@aol.com .

Go to Track 4


asisnavbar

How to Order

@ 2000, American Society for Information Science