of the American Society for Information Science

Go to
 Bulletin Index

Volume 25, No. 6

bookstore2Go to the ASIS Bookstore

August / September 1999



Categories, Photographs & Predicaments:
Exploratory Research on Representing Pictures for Access

by Brian C. O'Connor and Mary Keeney O'Connor

W e can represent the predicament of representing photographs – the horns of the dilemma – with two illustrations. On the one horn, indexer application of verbal descriptors for the topics of images, especially if they are applied with the level of generality typically used in libraries to describe verbal documents, fails to account for native elements of picture documents. Moreover, they often seem pale in comparison with verbal descriptions made by actual users of the documents.


Let's see if I can improve my record.
 - All right, Widowmaker, where are ya?
 - Bandage on knee suggests rough ride; belt buckle still
   gleams proudly!
 - I'm gonna whip that doggie – Cowboy with bandage on
   knee walking with resolute determination.
 - Get me out of Kansas.
 - He makes me feel strong in order to meet any challenge.
 - Young man with purpose.

   work clothes
   needs a bath
   face a challenge

   another era

   rural scene
   tough & rugged

Note, for example, in Figure 1, the difference between the LCSH Cowboys and Rodeos and the two sets of descriptors derived from actual users.  Similarly, note in Figure 2, the differences between Boys and Ocean [or even Outdoor Recreation for Children ] and the reactions of several actual viewers of the particular picture.

Makes me homesick for the ocean.
 - Whee!
 - Yahoo!
 - Cute, fun, potential
 - Up and over
 - Don't get wet
 - I wish I were a kid again
 - Reminds me of a vacation my family and I
   took to Padre Island
 - I miss the ocean – California here I come
 - Homesick for California
 - I wish I was that kid jumping in the waves


On the other horn, representing the native elements of photographs can remove ambiguity of interpretation at one level, yet may provide its own set of difficulties. For example, as in
Figure 3, if I am seeking a portrait of Homer in a WWW collection of images to illustrate an article on verbal representation and come across one image using a word search but find it not quite right, I might be tempted to click on Visually Similar. However, it is only by a large stretch of the imagination that most of the resulting images are similar in an ordinary and useful sense. Similar textures, color values, luminance values and spatial arrangement do not necessarily yield another statue of Homer.



In order to approach some resolution of the dilemma of representing pictures we went back to basics – to the Oxford English Dictionary and to Aristotle – in hope of a fresh approach. Category is defined in the OED "as predication or assertion given to general classes of terms, things, or notions" along with the caveat "the use being very different with different authors." There is also an expansion of the term predicament within the definition of category – "a class to which a certain predication or assertion applies." This is further refined under the definition of predicament as a "class about which a particular statement is made."

Question States

So what? Categories, we know, help us by reducing search time. There are fewer categories than there are individual items. Yet if categories are formed by assertions, what can we say about the assertion process in the context of describing and searching for image documents? Considering the assertion process we asked:

  • What constitutes an assertion?
  • Does it have to be words?
  • If the assertion need not be words, what might it be?
  • Need it be made only when the document is formally indexed?
  • Might it be made ad hoc during a search or within the retrieval environment?
  • Who might make the assertion?
  • If users are making assertions, then we can propose categories useful for searching based on the kind of assertions they make, given the knowledge they bring to the search.

As an initial response to these questions about assertions, we proposed categories of question states. Each question state has two elements of an information-seeking event: a function to be accomplished; a document (or set of documents) that would aid in accomplishing the function. The three categories of question states are as follows:

  • Articulated queries:  user can state function AND make specific assertions about the document or category of documents that would satisfy the function. Classical Aristotelian membership would hold (the documents in the category would have the specified properties in an unambiguous way), and there would be little dispute over membership.
  • Vague awareness queries: user can state function BUT CANNOT make specific assertions about the category of documents that would satisfy the need. General attributes, guesses about what might or might not be appropriate, AND descriptors (category membership assertions) made by previous users could segment the collection, though greater engagement with the documents for closer examination than in an articulated query would likely be required. Socrates is credited with the rhetorical question that describes this sort of query: How is it that a man knows not what he seeks, yet knows it when he sees it?
  • Shaking up the knowledge store: user CANNOT make assertions about attributes for category membership of the documents BECAUSE user CANNOT make specific assertions about the function or category of functions to be accomplished. This is the form of query we typically call browsing. Studies of browsing suggest that is important to scholars because their work depends on making new connections. A typical aspect of browsing is searching in areas not obviously relevant. Of course, interests, concerns, abilities, etc., will come into play when the user engages a document, but the whole idea of such an enterprise is to step outside the known and remain open to new connections.

Ordinarily in document collections categories are formed according to attributes of the documents themselves. The expectation, then, is that queries will be made in terms of document attributes. In those cases where users cannot articulate specific document attributes, perhaps we can, at least, make use of what they can say about what they want to accomplish. We can suggest areas of the collection not likely to be useful, we can suggest methods of navigation and evaluation and we can gather use or function descriptions made by previous users.

Generating Functional Descriptions

We, therefore, set about exploring users' functional descriptions of pictures. While our users would be seeing the pictures and making responses, we felt that eliciting descriptors beyond the topical might generate an access tool of some utility for those searchers who could express what they wanted to accomplish but could not be specific about what picture would work. For example, if a searcher wants an illustration for the concept of rugged determination, the rodeo cowboy photograph in Figure 1 might be appropriate, because several previous users have used rugged and determined to describe that image. We were seeking a way of eliciting emotive, evocative and associative descriptors for the pictures, as these would be the primary means of searching when specific assertions about the document could not be made.

First, we tried asking users simply to make up descriptions of each of a dozen images. Everybody who did this constructed a topical phrase resembling a Library of Congress Subject Heading. All the participants were librarians or library school students and seemed constrained to describe in the library way.  Subsequently we had test users write captions, responses (how did this picture make you feel) and lists of items recorded in the picture. (See Figure 4.).


Comments on the physically present
     photographic document
  * Adjectives attributing states or
     emotions to the subject of the
     photograph. Note the anonymous
     nature of the adjectives: admiration,
     relaxed, angry, contemplative, resigned
     to fate, lonely, despondent, bored,
Note also in the Caption column, the variety of assumptions about "place" – numerous locations in school setting, apartment and several metaphorical uses of the window for enlightenment or escape (e.g., open window to the world; Prince Charming, take me away!)

Escape from hot air
  A young Mary greets the. . .
  What can you tell
  An open window to the world
  Mary in a quiet moment
  Anger waits.
  It's bright outside!
  Prince Charming, take me away!
  What's next?
  Someone by the window
  Thoughtful afternoon
  School days are back

open window

  young girl
  young woman, open window
  girl, city, window

  girl, window
  woman, window

  female at window
  teenage girl, windows, traffic

  She must be a student.
  Sharp contrast light and dark
  a relaxed Mary

  looks angry
  Reminds me of the early 70s.
  nice shot!
  Attractive backlit scene
  Who is this person?
  She looks resigned to her fate

The first level of analysis consisted of gathering the adjectives and adjectival phrases that describe users' reactions to the images. These fall into categories making

  • some sort of direct statement about this picture: makes me feel happy
  • some sort of nominal state attributed to the image: serenity, disgust, pride
  • references to the physical characteristics of an image: this is a bright picture, "dark and moody."

Analysis of the Functional Descriptions

We then mounted all responses in an ACCESS database and did a rough content analysis to see what categories would emerge. The categories that emerged from what actual users said about pictures are

  • Narrative & Emotive Descriptors
    • Introductory phrases (reminds me of. . .; looks like)
    • Narrative paragraph (little stories)
    • Emotive terms (e.g., nostalgia, good memories)
    • Allusions to literature (Sleeping Beauty, Terminator)
    • Associative memories
  • Antonyms
  • Geography

It was gratifying to see that we did, indeed, pick up descriptors that were other than topical. There was a wide range of narrative and emotive descriptors. In addition, there were two striking results - one we had anticipated, one we had not anticipated. We had anticipated antonyms would be present in some considerable number, and they were. One person would describe an image as lovely; another person would describe it as depressing. One person would describe an image as makes me happy; another would describe it as this makes me really homesick, I wish I hadn't looked at this picture.

Geographic attribution is a form of description we had not anticipated. Many people felt compelled to locate the image. Note that the rodeo cowboy image in Figure 1 is attributed to Kansas, although there are no geographic cues other than the cowboy attire – the image was actually made in Davis, California. The ocean in the jumping boy picture in Figure 2 is attributed to California and is evocative of Padre Island – it is in New Hampshire.



 Lewis & Marguerite Clark

This compulsion leads to an interesting representation issue we might term functionality and wrongness. Approximately 75% of the people who dealt with the image in Figure 5 wrote something about location. Half of those said that it was in the Oklahoma Panhandle or some Dust Bowl area. It was actually made on a farm on the Canadian border in Maine in 1913. We must then ask if an image which has none of the defining attributes of having been made in a particular place or time might still be an appropriate response to some sorts of queries for images related to a specific place or period.

Consequences for Retrieval System Design

Seeing that we could elicit functional descriptors of different sorts, we set about asking what might be some of the consequences for image retrieval systems.  First, we noted Fidel's statement in The image retrieval task: Implications for the design and evaluation of image databases (The New Review of Hypermedia and Multimedia, pp. 181-199, 1997) that ". . . classes that represent interpretive attributes are more difficult to assign in indexing than in searching." This makes sense because of the inherent difficulties with a single source generating representations for varying uses.

However, on the basis of our exploration, we conclude that one can gather user assertions about interactions with pictures, that these form a richer descriptive palette than ordinary indexing and that such assertions can form a basis for constructing categories that can enhance subsequent use. We feel that the kind of descriptions we obtained could help users whose needs are defined primarily in terms of function.

The fact that different users make wildly differing assessments of particular (image) documents need not rule out the use of the descriptors, but the system would have to caution users. Profiling of users might bring results more quickly, especially as more uses were made of the system – though, of course, in any heterogeneous group, profiling raises certain ethical issues. It may be worth investigating whether the degree of "disagreement" over image document extra-topical attributes speaks in some more general way to the utility of such a document. That is, if an image of a trailer in Idaho makes some people very anxious and others especially tranquil, does the image have some higher quotient of comfort/discomfort evocation than an image on which most users agree?

Finally, we began to lay out a set of relations between types of assertions and seeking behaviors. We have not elaborated this into a particular interface or interface module at this time, though we are investigating the possibilities. Relating the specificity or type of assertion to the question state of the user yields:

  • Document Assertion – requires limited user engagement with documents
    • Document attributes directly mapped to function and asserted.
    • Members of category fill need.
  • Function Assertion – requires significant user engagement with documents
    • Document attributes are not directly mapped to function, thus not asserted.
    • Previous function descriptors (emotive, evocative, affective) map to desired function.
    • Categories based on previous function descriptors segment collection for initial examination.
    • Members of categories may fill need.
  • No a priori Assertion – requires user immersion into the collection
    • No document attributes asserted.
    • No function attributes asserted.
    • Single category comprised of all documents provides opportunity for discovery of new connections.

We have suggested a set of relations that may seem quite obvious on the face of it, yet pose difficulties in the absence of verbal representations of image characteristics. We have tried to look at the image retrieval problem through the eyes of producers and image users. We have a sense that a robust model of image retrieval must account not only for the professional user of an image archive or the professional intermediary, but also the occasional, "naοve" user who has several photo albums or who logs onto the Web. Such a model should probably also be able to address all the out of focus and poorly exposed pictures people keep because they serve some purpose.

It is our hope that the thoughts presented here will further the dialog and the examination of image representation issues. It is, likewise, our hope that taking some of the assertion making responsibility off the shoulders of cataloguers and putting it into the hands of users of the system will generate a more dynamic system that is more richly representative of both the images and the user requirements.

Brian C. O'Connor is affiliated with the Interdisciplinary Ph.D. Program in Information Science in the School of Library & Information Sciences at the University of North Texas, Denton, TX 76203. He can be reached by phone at 940/565-2347 or by e-mail at boconnor@lis.admin.unt.edu.

Mary Keeney O'Connor is assistant project manager of the IMLS Digital Imaging Grant in the School of Library & Information Sciences at the University of North Texas.



How to Order

@ 1999, American Society for Information Science