Categories, Photographs & Predicaments:
by Brian C. O'Connor and Mary Keeney O'Connor
W e can represent the predicament of representing photographs the horns of the dilemma with two illustrations. On the one horn, indexer application of verbal descriptors for the topics of images, especially if they are applied with the level of generality typically used in libraries to describe verbal documents, fails to account for native elements of picture documents. Moreover, they often seem pale in comparison with verbal descriptions made by actual users of the documents.
Note, for example, inFigure 1, the difference between the LCSH Cowboys and Rodeos and the two sets of descriptors derived from actual users. Similarly, note in Figure 2, the differences between Boys and Ocean [or even Outdoor Recreation for Children ] and the reactions of several actual viewers of the particular picture.
On the other horn, representing the native elements of photographs can remove ambiguity of interpretation at one level, yet may provide its own set of difficulties. For example, as in Figure 3, if I am seeking a portrait of Homer in a WWW collection of images to illustrate an article on verbal representation and come across one image using a word search but find it not quite right, I might be tempted to click on Visually Similar. However, it is only by a large stretch of the imagination that most of the resulting images are similar in an ordinary and useful sense. Similar textures, color values, luminance values and spatial arrangement do not necessarily yield another statue of Homer.
In order to approach some resolution of the dilemma of representing pictures we went back to basics to the Oxford English Dictionary and to Aristotle in hope of a fresh approach. Category is defined in the OED "as predication or assertion given to general classes of terms, things, or notions" along with the caveat "the use being very different with different authors." There is also an expansion of the term predicament within the definition of category "a class to which a certain predication or assertion applies." This is further refined under the definition of predicament as a "class about which a particular statement is made."
So what? Categories, we know, help us by reducing search time. There are fewer categories than there are individual items. Yet if categories are formed by assertions, what can we say about the assertion process in the context of describing and searching for image documents? Considering the assertion process we asked:
As an initial response to these questions about assertions, we proposed categories of question states. Each question state has two elements of an information-seeking event: a function to be accomplished; a document (or set of documents) that would aid in accomplishing the function. The three categories of question states are as follows:
Ordinarily in document collections categories are formed according to attributes of the documents themselves. The expectation, then, is that queries will be made in terms of document attributes. In those cases where users cannot articulate specific document attributes, perhaps we can, at least, make use of what they can say about what they want to accomplish. We can suggest areas of the collection not likely to be useful, we can suggest methods of navigation and evaluation and we can gather use or function descriptions made by previous users.
Generating Functional Descriptions
We, therefore, set about exploring users' functional descriptions of pictures. While our users would be seeing the pictures and making responses, we felt that eliciting descriptors beyond the topical might generate an access tool of some utility for those searchers who could express what they wanted to accomplish but could not be specific about what picture would work. For example, if a searcher wants an illustration for the concept of rugged determination, the rodeo cowboy photograph inFigure 1 might be appropriate, because several previous users have used rugged and determined to describe that image. We were seeking a way of eliciting emotive, evocative and associative descriptors for the pictures, as these would be the primary means of searching when specific assertions about the document could not be made.
First, we tried asking users simply to make up descriptions of each of a dozen images. Everybody who did this constructed a topical phrase resembling a Library of Congress Subject Heading. All the participants were librarians or library school students and seemed constrained to describe in the library way. Subsequently we had test users write captions, responses (how did this picture make you feel) and lists of items recorded in the picture. (SeeFigure 4.).
The first level of analysis consisted of gathering the adjectives and adjectival phrases that describe users' reactions to the images. These fall into categories making
Analysis of the Functional Descriptions
We then mounted all responses in an ACCESS database and did a rough content analysis to see what categories would emerge. The categories that emerged from what actual users said about pictures are
It was gratifying to see that we did, indeed, pick up descriptors that were other than topical. There was a wide range of narrative and emotive descriptors. In addition, there were two striking results - one we had anticipated, one we had not anticipated. We had anticipated antonyms would be present in some considerable number, and they were. One person would describe an image as lovely; another person would describe it as depressing. One person would describe an image as makes me happy; another would describe it as this makes me really homesick, I wish I hadn't looked at this picture.
Geographic attribution is a form of description we had not anticipated. Many people felt compelled to locate the image. Note that the rodeo cowboy image inFigure 1 is attributed to Kansas, although there are no geographic cues other than the cowboy attire the image was actually made in Davis, California. The ocean in the jumping boy picture in Figure 2 is attributed to California and is evocative of Padre Island it is in New Hampshire.
Consequences for Retrieval System Design
Seeing that we could elicit functional descriptors of different sorts, we set about asking what might be some of the consequences for image retrieval systems. First, we noted Fidel's statement in The image retrieval task: Implications for the design and evaluation of image databases (The New Review of Hypermedia and Multimedia, pp. 181-199, 1997) that ". . . classes that represent interpretive attributes are more difficult to assign in indexing than in searching." This makes sense because of the inherent difficulties with a single source generating representations for varying uses.
However, on the basis of our exploration, we conclude that one can gather user assertions about interactions with pictures, that these form a richer descriptive palette than ordinary indexing and that such assertions can form a basis for constructing categories that can enhance subsequent use. We feel that the kind of descriptions we obtained could help users whose needs are defined primarily in terms of function.
The fact that different users make wildly differing assessments of particular (image) documents need not rule out the use of the descriptors, but the system would have to caution users. Profiling of users might bring results more quickly, especially as more uses were made of the system though, of course, in any heterogeneous group, profiling raises certain ethical issues. It may be worth investigating whether the degree of "disagreement" over image document extra-topical attributes speaks in some more general way to the utility of such a document. That is, if an image of a trailer in Idaho makes some people very anxious and others especially tranquil, does the image have some higher quotient of comfort/discomfort evocation than an image on which most users agree?
Finally, we began to lay out a set of relations between types of assertions and seeking behaviors. We have not elaborated this into a particular interface or interface module at this time, though we are investigating the possibilities. Relating the specificity or type of assertion to the question state of the user yields:
We have suggested a set of relations that may seem quite obvious on the face of it, yet pose difficulties in the absence of verbal representations of image characteristics. We have tried to look at the image retrieval problem through the eyes of producers and image users. We have a sense that a robust model of image retrieval must account not only for the professional user of an image archive or the professional intermediary, but also the occasional, "naοve" user who has several photo albums or who logs onto the Web. Such a model should probably also be able to address all the out of focus and poorly exposed pictures people keep because they serve some purpose.
It is our hope that the thoughts presented here will further the dialog and the examination of image representation issues. It is, likewise, our hope that taking some of the assertion making responsibility off the shoulders of cataloguers and putting it into the hands of users of the system will generate a more dynamic system that is more richly representative of both the images and the user requirements.
Brian C. O'Connor is affiliated with the Interdisciplinary Ph.D. Program in Information Science in the School of Library & Information Sciences at the University of North Texas,
Denton, TX 76203. He can be reached by phone at 940/565-2347 or by e-mail at firstname.lastname@example.org. Mary Keeney O'Connor is a
Mary Keeney O'Connor is assistant project manager of the IMLS Digital Imaging Grant in the School of Library & Information Sciences at the University of North Texas.