Please tell us what you think of this issue!  Feedback

Bulletin, June/July 2008

Why Information Has Shape

by Andrew Dillon

Andrew Dillon is dean of the School of Information at the University of Texas. He can be reached at adillon <at>

Language, in both written and spoken forms, contains many subtleties that a member of a speech community slowly learns to decode and use. Grammatical rules and vocabulary occupy most attention early on, but as we become more and more experienced, we learn to use language constructively and in patterns that serve as a bridge between sender and receiver. In written language, where the normal restrictions of working memory capacity can be overcome through cueing, headings and the ability to retrace previous utterances, humans have evolved a range of genres that serve as patterns for communication over lengthy discourse. 

Genre studies can become very technical, but the basic functions of genre are not difficult to grasp. As stylized patterns of communication adopted within groups, genres serve to establish the parameters of a communication process and in so doing prime the participants to deliver and anticipate receiving certain types of information. Conforming to these expectations enables easier processing of the information flow and also signals progress through the communicative passage to participants, both temporally and semantically. 

These naturally emerging regularities of language that we term genre influence human information processing in a regular, repeatable manner. Humans are natural pattern extractors – our brains seem to seek regularity in complex data, and we are known to impose patterns where they don’t even exist. I used to regularly demonstrate this phenomenon to classes by presenting them with truly random letter or number sequences and asking students to guess the next two elements in the series. Given sufficient time, students would invariably produce answers with sometimes tortuous arguments of why their sequence logically worked, and some would not accept that the sequence had no underlying logic or pattern. Such pattern-seeking dispositions help us organize a world of data coming at our senses continuously. In this way, it is not hard to understand the importance of genre in information activities. When attuned to their existence, a reader, a listener or a viewer has a mental framework into which she can put incoming data, providing context and cues for comprehension, priming her to expect further data of a certain type and drawing attention to components of the communicative process where data may be missing. 

We have studied the process of making sense of informational forms when they conform or violate such expectations of genre. For example, in one study, we developed an online newspaper that contained real text from a daily newspaper, grouped into typical story types (current affairs, local, national, sports, etc.), but was presented in ways that reflected either typical newspaper layout and form or ways that violated such expectations. When people were asked to interact with these digital newspapers continuously for several days, we observed their navigation and checked their comprehension for content. Our results showed that the initial response to the genre-violating form was quite negative, and their performance revealed greater navigational difficulties and poorer ability to locate specific information. Over the course of several days, their performance improved but they never performed as well as readers exposed only to the more typical, genre-conforming newspaper. 

It seems, therefore, that people respond to structural regularities in language that create a sense of shape within an information space and people employ these cues, often without conscious awareness, to guide their exploration and comprehension. Since this appears to be a natural component of information use, we should not be surprised that our routine communications evolve conventional forms. At one level, the forms give rise to visual cues and structures (headings, layouts, access mechanisms) that are easily understood. However not all conventions lend themselves to instantiation so simply, and one finds the cues buried within the text in the form of language employed, the high level order of content and narrative, and the form of information provided at various parts of an extended document or exchange.

It follows that the perception of genre conventions and the ability to interpret their uses depend on the experience of a user. Given the subtle nature of many genre conventions, multiple and repeated exposure to a form is often required to understand its application. Conversely, understanding and using genre conventions appropriately might be considered a defining characteristic of expertise in a domain. This observation is certainly true of advanced communicative forms such as academic articles, scholarly books and technical reports. In a study of scientific article users, we showed that experts could read an isolated paragraph of text from a journal and estimate with almost perfect reliability in which section of the journal article this text belonged. Novice readers of this literature were unable to do so. However, it is also likely that we can discriminate users of various digital environments on the basis of their familiarity with genre.

Given the obvious potency of genre conventions in information use, it might seem natural to consider genre as a cue for retrieval, but I am less convinced this technique has real value. Certainly we can tag documents with genre labels rendering them identifiable on this basis, but it is not clear that such a label would add significant information to an already tagged set of documents. Knowing that a journal article appears in a scholarly publication is important, but this determination can come from many existing sources independent of a formal genre descriptor. But genre attributes can add significant value as navigation aids within a document, and if we were able to determine a finer grain of genre attributes than those typically employed, it might be possible to use these as guides for information seekers. For this to work, we would need a way of describing aspects of the information shape that reflect underlying narrative and semantic flow, allowing us to use these, relatively unambiguously, at the document level so as to support refined searches within a document type.

Such analyses offer clear research targets, and we have attempted to do something of this form, at least at the visual level, with a search interface for poetry that presents a template to the user in the form of a verse structure, with a series of blank lines representing the layout of a title and verse. The user can input any term in any space and the search interface interprets accordingly to find the term in the title, the first line, other than first line but somewhere in the verse, etc. Poems have relatively regular physical structures that allow this, and the user response is generally positive. But extended narratives offer a more complex challenge given their more uniform physical formats and the need for a more semantic reflection of their shape.

You can appreciate the importance of semantics by reflecting even on your own use of information space. While significant research efforts are expended on retrieval and navigation, people do not retrieve documents for the purpose of navigating them. They retrieve them to use, to be informed, to understand, to solve a problem or for similar reasons, and they navigate these documents to gather meaningful content for such tasks. Thus, retrieval and navigation are elements of a task, but they are not the complete task, and I often wonder if we have lost sight of this reality in our research efforts. There seems to be a whole literature given over to these task elements without an equivalent level of effort to understand how meaning is extracted or how people make sense of the documents they are navigating through. Information use such as reading, comprehending and comparing involves both spatial and semantic processing interchangeably if not simultaneously, as people create working models of the space in which they are working. People learn to recognize regularities in the spatial form (layout, headings, length and so forth) and in the semantic form (introduction, method, discussion, analysis and so forth) and then employ both elements to guide their process. In the field of human-computer interaction there have been decades of work on the spatial aspects of designing the best interface mechanisms to aid navigation, but precious little work addressing the semantic elements, which can actually aid a user in comprehending the documents being used. This is an imbalance in research that should be corrected in the years ahead. 

In all this we really ought to recognize the incredible sophistication of cognitive processing that underlies the evolution of our world’s information space. At this moment we sit on the inherited forms of documents refined continually by creators and users, members of speech and practice communities, that has resulted in forms that allow experienced users to convey and extract meaning rapidly or to leverage properties of the document in support of comprehensive processes that would exhaust our limited cognitive architecture were we to rely on real-time communication only. With new digital forms possible, we can witness in our world now the costs of violating genre and the process of enabling new genre forms. As we have shown in our research, the genre of a web home page took hold very quickly in the 1990s – in a matter of years, when most scholars of the paper world were convinced genre only emerged over decades. 

The apparent chaos of digital communications is superficial and may reflect a bias in perception by those schooled in the paper domain. It is not enough to apply genre conventions from paper to digital space, the transfer is not always neat, and we should realistically expect native digital genres to evolve through the community of users most actively communicating through these channels. Such genres will evolve and as they do, we may lose genre forms from the pre-digital era. This transformation is the shape of information, and the continuous tension between fixed and fluid is part of the natural order.