Please tell us what you think of this issue!  Feedback

Bulletin, June/July 2009

Reconstructing Bellour: Automating the Semiotic Analysis of Film

by Richard L. Anderson and Brian C. O’Connor

Richard L. Anderson and Brian C. O’Connor are with the Visual Thinking Laboratory in the College of Information, Library Science, & Technologies at the University of North Texas. Richard can be reached at rich.anderson<at>>. Brian’s email is boconnor<at>

In 1981, film theorist Bertrand Augst asked (personal communication), “Why can’t we use a computer to measure and speak of filmic structure in the same way we can for verbal text?” Augst’s question arose in a conversation on the difficulties for film studies that arise from the “literary metaphor.” This is not to say there is no discourse mechanism at work in films; it is that attempts at one-to-one correspondence between the frame and the word or the shot and the sentence or similar impositions of the verbal form onto the image form failed. Films are not textual documents. Films do not have a rigidly defined grammatical structure. Images are not words. Shots are not sentences [1, p.224]. Films are generally viewed at a set rate of presentation and linearity. The technology used in the production and viewing of film has changed considerably since Augst posed his original question; however, there has been little change or advancement in film theory as a result of better and more efficient technology [2]. 

The Structure of Moving Image Documents
It has been common in both film description and film analysis to use the “shot” as the base or minimum unit. However, there is no definition of shot that specifies any specific set of parameters for any particular attribute – no specific number of frames or type of content. Bonitzer [3] refers to definitions of shot as “endlessly bifurcated.” Similarly, the terms close up (CU), medium shot (MS) and long shot (LS) are used in film production textbooks and film analyses; however, there is no specification of how much frame real estate is occupied by some object or portion of object in the frame to be a CU rather than MS, for example. For our purposes, we use the frame and measurable attributes of the frame in order to speak specifically and to avoid the difficulties presented by “endless bifurcation.”

The signal or the information of a film is presented in small units – frames – that are in themselves self-contained signals. In many instances they are even used as messages – for example, an individual frame may become a movie poster. However, the film and other time varying signal sets such as music and dance are signal sets of their given sort precisely because of their temporality. We see or hear the signal set (document) as a set of changes over time.

It could be said that one can stare at a painting or sculpture for a few seconds or an hour from differing viewpoints, thus making the viewing a time-varying experience of the signal set. It could probably be argued that artists of various sorts construct signal sets that demand attention for a long time in order to see all the intended variations in the signal set. It can even be argued (and we have so argued) that the digital environment gives viewers reader-like control over temporality and depth of penetration into films. However, it remains the case that the majority of film produced for commercial consumption assumes playback at a standard rate and linearity.

Much of what is taught in film schools and much of what has transpired in film analysis relates to variation in the temporal aspect of the film. Eisenstein [4] and Vertov [5] and some others spoke eloquently of time and its relation to structure. Structural commentary from reviewers tends to be less precise. For example, LaSalle [6] describes The Legend of Zorro as a “130-minute adventure movie that overstays its welcome by about 80 minutes,” and Addiego [7] describes Domino as “[a] psychedelic action picture that hammers away at the audience with a barrage of editing tics and tricks.”

We are seeking a way to speak of the structure of a film precisely in order to enable a more productive examination of the meanings of the message for various viewers under various circumstances. In looking to previous work on the examination of the filmic message or signal set, we noted Augst’s 1980 [9] comment on Bellour’s analysis [10] of Hitchcock’s The Birds [11]:

It remains exemplary in the rigor and precision of the analysis performed and, to date, it is still the best example of what a genuine structural analysis of a filmic text could attempt to do. One must turn to Jakobson or Ruwet to find anything comparable in literary studies.

A comment by Augst [8] on Bellour’s response to criticism of his work as pseudo-scientific and not sufficiently in touch with aesthetic aspects of film analysis addressed our particular concerns with devising an accurate and transferable means of describing the signal set: “[criticisms] continue to be leveled at any procedure that in any way exposes the gratuitousness and arbitrariness of impressionistic criticism.”

Bellour’s work elaborated on Metz’s semiotic notions of film [11], particularly the concept of syntagmas, by introducing levels of segmentation greater and lesser than Metz’s. This enabled structural analysis of filmic signal sets of any length and, eventually, of any sort, not simply the set, say, of classic American Hollywood features.

Difficulties for Bellour
We identified two difficulties with Bellour’s signal set analysis. The first was the time-consuming nature of its practice. Simply locating the proper portions of film, timing them, re-photographing frames for analysis and publication, to say nothing of commentary or analysis, took days and weeks. 

The second is that Bellour conducted his work too early – for the remarkable precision of Bellour’s analysis, without digital technology he did not have a precise system of description at the frame level. He could write of contents of the frame and of relationships holding among frames, but not with deep precision – for example, the shades of various colors and their changes from frame to frame.

The digital environment enables us to address both issues. Grabbing all the individual frames from a digital version of a film requires only seconds, not days. Also, pixels enable addressable analysis of the red, green, blue and luminance components of any point in the frame, as well as comparisons of values at the same point or set of points across time. The mechanics of the practice of film analysis which once would have required enormous resources of time, funding and technology are today essentially trivial.

However, the technical ability to address and measure points within and across frames does not address Augst’s earlier question; nor does it, in itself, provide a “genuine structural analysis of filmic texts.” We have the technology – but what should we do with it? Techniques for analyzing the structure of moving image documents are well known and mature. In 1995 Dailianas, Allen and England [12] reviewed a number of techniques for the segmentation of video including techniques for measuring the absolute difference between successive frames, several histogram-based methods, as well as the measurement of objects within frames. These techniques proved to be robust when compared against human observers; however, all techniques were prone to false positives. They note that 

[b]ecause all the methods studied here have high false-identification rates, they should be thought of as providing suggestions to human observers and not as an ultimate standard of performance. [p. 12]

Structure and function have a complementary, but independent relationship. In order to advance the state of both structural and theoretical analysis, the relationship between structure and function must be taken into account. In other words, an analysis that takes both structure and function into account is greater than the sum of its parts. Kearns and O’Connor [13] provide a strong example of this approach in their demonstration of the relationship between the entropic structure of television programs and the preferences of a group of viewers. 

The approach taken here combines an algorithmic structural analysis of the Bodega Bay sequence of Hitchcock’s The Birds with the expert analysis of Bellour. Our hope is that a heuristic will emerge that will lead toward a solution to the problems identified both by film theorists and those who wish to analyze moving image documents for the purposes of indexing and retrieval. 

Binary Systems of Structure and Function
We are using the technical definition posited by Claude Shannon and we state strongly our support of Warren Weaver’s comment in his introduction to Shannon’s Mathematical Theory of Communication [14]:

The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular information must not be confused with meaning. [p. 8]

The concept of information developed in this theory at first seems disappointing and bizarre – disappointing because it has nothing to do with meaning, and bizarre because it deals not with a single message but rather with the statistical character of a whole ensemble of messages, bizarre also because in these statistical terms the two words information and uncertainty find themselves to be partners. [p. 27]

However, it is the very distinction between information and meaning that provides a theory base and descriptive tool-kit for the description and analysis of film. For Shannon, information is the amount of freedom of choice in the construction of a message. This concept was ordinarily expressed as a logarithmic function of the number of choices. What is important is Shannon’s assertion that the semantic aspects of communication have no relevance to the engineering aspects; however, the engineering aspects are not necessarily irrelevant to the semantic aspects.

Shannon’s notion of information is a binary system. Message and meaning are separate, but complementary notions. This system bears a strong resemblance to the distinction between signifier and signified in semiotic theory, as well as the separation of topography and function in the behavior analytic theory of verbal behavior (see Skinner [15] and Catania, [16]) and Wittgenstein’s notion of a language game [17] [18]. 

Our model for analysis assumes such a binary relationship. The structural analysis was conducted by measuring the changes in color palette across frames in the Bodega Bay sequence of Hitchcock’s The Birds. The functional analysis comes from Bellour’s analysis of the same sequence of the film. 

Functional Analysis “System of a Fragment,” Bellour
Behavior analysis is an empirical and functional way to examine questions involving human behavior. Skinner [19] describes the logic of a functional analysis:

The external variables of which behavior is a function provide for what may be called a causal or functional analysis. We undertake to predict and control the behavior of an individual organism. This is our “dependent variable” – the effect for which we are to find the cause. Our “independent variables” – the causes of behavior – are the external conditions of which behavior is a function. Relations between the two – the “cause-and-effect relationships” in behavior are the laws of a science. A synthesis of these laws expressed in quantitative terms yields a comprehensive picture of the organism as a behaving system. (p.35)

Why is this logic important to our seeking a conceptual framework and set of tools for structural analysis of film? Our question concerns the relationship between the physical structure of the Bodega Bay sequence of The Birds and Bellour’s description of the structure of the sequence. In other words, what physical attributes of the sequence prompted Bellour to make the statements he made about the film?

The notion of a binary system is so fundamental, we re-make an earlier statement: a behavior analytic account of verbal behavior is a binary system. The structure or topography of a particular instance of verbal behavior has a complementary, but separate, relationship with the function or meaning of that particular instance. The behavior analytic account is similar in many respects to the separation of message and meaning in Shannon’s work as well as semiotic theories of meaning. Behavior analysis provides an analytical language and framework that is appropriate for the problem at hand.

Catania [16] defines a tact as “a verbal response occasioned by a discriminative stimulus." A discriminative stimulus is a stimulus that occasions a particular response and is correlated with reinforcement. In this particular case, the tacts or verbal responses of interest are the statements about the Bodega Bay sequence made by Bellour in The Analysis of Film [20]. The discriminative stimuli are the physical dimensions of the film that prompted Bellour to make the statements he did in The Analysis of Film. The reinforcement in this case is assumed on the grounds that The Analysis of Film is considered to be a seminal work in the film theory community and Bellour and others applied the same types of analysis to other films.

Functional Analysis of Bellour’s Verbal Behavior
We sought a means of structural analysis in turning to the expertise of Raymond Bellour. We selected a piece of his rigorous analysis, System of a Fragment: On The Birds (originally “les Oiseaux: analyse d’une séquence” [9]) using it as a record of his engagement with the signal set of a portion of the Hitchcock film. We captured the frames from the sequence for a data set of 12,803 frames. We then decided to determine how much of Bellour’s response could be accounted for by one element of the data – the distribution of color across each and every frame. That is, we did not account for sound, for edge detection or for previous knowledge. 

The sequence is, on the face of it, rather simple. A young woman, Melanie Daniels, sets out in a small motorboat with a pair of lovebirds in a cage. She crosses Bodega Bay to leave the birds as a gift to catch the attention of a young man, Mitch Brenner. She enters the Brenner’s house, leaves the birds and returns to the boat to go back across the bay. Mitch spots Melanie crossing the bay. Mitch drives around the bay to the pier where Melanie will be arriving. A sea gull strikes Melanie and cuts her head before she reaches the pier. Mitch helps Melanie out of the boat and they walk toward a shop to tend to the wound.

When Melanie is on the bay, Bellour points out, we are presented with a classic Hollywood form, alternation – we see on the screen Melanie looking, then that at which she looks, then Melanie again. This form continues until she arrives at the house. While she is in the house we simply observe her behavior, except for a brief look out the window at the barn. Bellour sees this scene in the house as a “hinge” in the design of the film. It disrupts the pattern of alternation, while it also takes Melanie off the water. 

As Melanie returns to the boat, we see what looks rather like the beginning of her trip – she is getting into the boat and heading off. However, Mitch sees her; then she and Mitch acknowledge one another. Bellour refers to the scene in the house (the hinge) and the double act of seeing as the “two centers” of the Bodega Bay sequence. 

As an integral portion of his analytic writing, Bellour includes photographic frames from the Bodega Bay sequence – key frames. Ordinarily, these are the first frames of each shot in the sequence. However, this is not always the case. The difficulties of defining “shots” seem to be manifested here. We will discuss this point at greater length; for now, “shot” is ordinarily understood to be a mechanical unit – all the frames from camera original film (or a working copy) left in by an editor. Thus, all the beginning frames, where the camera comes up to speed, the director shouts, “action” and the miscues before usable footage is available, are cut out. Then a set of frames – each a still image representing approximately 1/30th of a second – shows the portion of the action desired by the director. Then a cut – in film, an actual mechanical cut; in video, still a cessation of a particular stream of data – is made and another shot appended. The process is repeated until the end of the film.

Ordinarily, especially in older films, there is a close correlation between the mechanical cuts and the data within the shot. However, there is a problem here for the definition of shot – data may change even in one run of the camera or one stream of frames between cuts. The camera may remain still while various objects come and go in front of it; the camera may move and present different views of the same objects or even different objects; the camera may remain still, but have the length of its lens changed during a shot; or various combinations of these may take place. For the viewer, whether several objects or views are shown in different shots or one shot may be of little overt consequence. However, in attempting do critical analysis, one is faced with finding a unit of meaning or, at least, a unit of address and measure that provides precision of description. 

In our analysis, we operate at the level of the individual frame (29.97 frames per second.) We refer to Bellour’s shot numbers and to his two primary divisions: “A” for Melanie’s trip across the bay, her time in the house and her return to the boat; “B” for her return trip in the boat.

According to Bellour’s analysis and textual description of the Bodega Bay sequence, then we should expect to find the following tacts (verbal responses to the film) in the physical document: key frames and key frame sets, alternation, two centers – the “hinge” sequence and a second center. 

In summary, Bellour identified the following features in the physical document: key frames and key frame sets, alternation, two centers – the “hinge” sequence and a second center when Melanie and Mitch see each other. The question is: Can we identify elements in the physical structure of the film that could have stimulated his verbal responses (tacts)? 

Structural Analysis of the Bodega Bay Sequence
There are several approaches that could be applied to the structural analysis of a film. Salt [21] advocates an approach based on the notion of the “shot” and the statistical character and distribution of “shots” within a moving image document. O’Connor [22] and Kearns & O’Connor [13] employed an information theoretic approach to the analysis of film. O’Connor [22] used a technique that measured the change of the size and position of objects or, more accurately, pixel clusters within a film. Dailianas, Allen and England [12] reviewed a number of automated techniques for the automatic segmentation of films that included the analysis of raw image differences between frames, a number of histogram based techniques and an edge detection based approach.

In choosing a technique for structural analysis of a film, the nature of the question one hopes to answer must be taken into account. An information theory approach such as that taken by Kearns and O’Connor [13] measures the structure of an entire film or message in Shannon’s [14] terms. Bellour described the Bodega Bay sequence in fairly microscopic detail. An information theoretic approach would not be granular enough to adequately match Bellour’s description. It should be noted that Kearns’ concept of “entropic bursts” [23] might provide a finer grained information theoretic appropriate for the task at hand. Salt’s (1992) statistical approach based on the analysis of shots is limited in a number of respects. The previously discussed conceptual problems with the “shot” as a unit of analysis makes Salt’s approach untenable. In addition, Salt’s analysis examines the statistical character and description of shots over the course of a complete film or collection of moving image documents. Like the information theoretic approach, Salt’s approach is macroscopic. Finally, the phenomena addressed by Salt’s methods are not congruent with elements of the moving image document that Bellour addresses in his analysis. The segmentation techniques reviewed by Dailianas, Allen and England [12] provide the level of detail necessary for the detection of key frames and frame sets in Bellour’s analysis; however, they would not be appropriate for detecting alternation or detecting the centers within the sequence as identified by Bellour. 

Our ultimate goal in analyzing the structure of the Bodega Bay sequence was to find the elements of the physical structure of the moving image document that prompted Bellour to make the statements (tacts) he did about the film. To accomplish this task, it was necessary to look at the structure of the segment on at least two levels. First, Bellour breaks the sequence into “shots” or frame sets and selects key frames. This requires an examination of individual frames. Second, Bellour describes alternation between the frame sets, the unique character of the “hinge,” the two centers and the gull strike. These tacts are descriptions of the relationship between frame sets.

We sought precise, repeatable, numeric and graphical representations of the signal that would enable discussion of filmic structure – the message, in the terms of Shannon and Weaver. We sought the means by which we might discuss message structure, so that discussions of meaning would have a significant touchstone. It might be said that we sought a method of fingerprinting the frames.

In standard digital images each and every color is composed of a certain amount of red, a certain amount of green and a certain amount of blue – with black being the absence of any red, green or blue and with white being maximum of each. In the frame images we captured there is a possibility of 256 shades of red, 256 shades of green and 256 shades of blue for a possible palette of over 16 million colors. Deriving a histogram of each of the RGB components or the aggregated values distributed across an X-axis of 255 points (the zero origin being the 256th) yields a fingerprint – a color distribution map – of each frame.

Perhaps one of the most appealing aspects of mapping color distribution is that it is an entirely software-based process. There is no necessity for human intervention to determine and mark what is to be considered the “subject” or how many pixels (what percentage of the frame area) make up some viewer-selected object. Not that these are not useful for some sorts of analysis, but using just the color palette enables an essentially judgment-free analytic process.

Structural analysis.
We converted the Bodega Bay to an AVI file and then extracted the individual frames to 12,803 JPG image files. We generated RGB histograms for each of the 12,803 frames using the Python Imaging Library. A Lorenz transformation was then performed on each histogram. We calculated a Gini coefficient for each frame to generate a scalar value representing the color distribution of each frame. The Gini coefficient compares a perfectly even distribution of RGB against the actual distribution in each frame. We used the differences in Gini coefficients between successive frames as a measure of change across frames. 

Codifying Bellour’s analysis. Bellour’s analysis does not include precise times or frame numbers to either select key frames or delineate frame sets; however, he includes photographs of the key frames. The frame numbers for Bellour’s key frames and frame set boundaries were selected using visual comparison between the photographs from Bellour’s article and the extracted frames. Frame sets were composed of all the frames between successively identified key frames and tagged using Bellour’s numbering convention. Bellour grouped frame sets into higher-level groups. The frame sets were arranged into higher level groups using Bellour’s description.

Due to the differences in precision between Bellour’s analysis and the structural analysis, we believed that visual analysis would be the most appropriate option for the task at hand. Bellour’s analysis began with shot number 3 of the segment and continued to shot 84. Bellour includes two groups of shots that have little bearing on his analysis of the sequence: Melanie’s acquisition and boarding of the boat (3-12) and Melanie’s arrival at the dock following her trip and the gull strike (84a-84f). These sets do not play into Bellour’s analysis and appear to function only to demarcate the segment within the larger document – the entire film of The Birds.

Detection of key frames and frame sets. Figure 1 shows the absolute value of the difference between the Gini value of a particular frame of the Bodega Bay sequence of The Birds and the previous frame. The mean difference between frames for all frames in the sequence is 0.003826, which is represented on the graph by the green (lower) horizontal line. The mean difference between frames identified as key frames by Bellour was 0.075678. The difference values fall into a bimodal distribution. The difference values of key frames and the proceeding frame were an order of magnitude higher than the difference values between frames that were not identified as key frames. Figure 2 shows the Gini coefficients for each frame broken down into individual frame sets as identified by Bellour. Within shots, the Gini coefficients remain stable for most shots and trend in a linear manner. Notable exceptions to this pattern include the group of frame sets that make up Bellour’s “hinge” sequence (25-43); the gull strike (77); and Melanie’s arrival at the dock following the gull strike (84a-84f).

Figure 1
Figure 1
. Differences in the Gini values between successive frames in the Bodega Bay sequence

Analysis of frame sets. Figure 2 shows the Gini coefficients of each frame of the segment broken down by shot number, presenting the flow of the color distributions across the time of the film sequence. We might construct a tact map by over-layering indicators for some of the key elements mapped by the data in Figure 2, as in Figure 3 (below). Once Melanie is actually underway on her trip to the Brenner house, we have almost uninterrupted alternation. We are presented with Melanie in the boat, then the Brenner house as she sees it – Bellour’s shots 15 through 22. Then we are presented with Melanie paddling the boat and seeing the dock – 23-24; then walking on the dock and seeing the barn – 25-31. That is, shots 15 through 31 present Melanie, what she sees, Melanie, what she sees, and so on. The latter portion is more distinct in the graph, though the entire sequence of shots clearly shows alternation.

Figure 2
Figure 2.
Gini coefficients of each frame broken down by shot number.

We should note that the RGB graph does not necessarily indicate that there is alternation in the sense of Melanie/dock/Melanie/dock/Melanie. However, one would still be able to say that there is alternation of the RGB pallets, regardless of whether a human viewer would say that the same objects were in front of the lens. Such an RGB alternation might have its own discursive power.

Bellour’s hinge sequence runs from frame number 5219 to frame number 6447 – Bellour’s shot numbers 32-36 (A3). Bellour also refers to this sequence as the first of the two centers. It would make some sense, then, that it would be in the vicinity of the center and the final frame number 6447 is very near the center of 12,803 frames. More significant is the distribution of the Gini values – they are clustered more closely to the .5 line and they display much less variation than we see in most of the rest of the graph. Given the different form of the distributions on either side of the first center it is not untenable to assert the graphic appearance of a hinge (Figure 3).

Figure 3
Figure 3.
“Tact map” showing Bellour’s hinge sequence and other key features.

What is not so immediately evident graphically is the second center – that point in the sequence when Mitch sees Melanie – a second center in that it breaks up the rhyme of the trip out and the trip back for a second time. That is, Melanie has exited the house and heads back to the dock and the boat. It seems that after having been in the house – the first center – Melanie will simply head back; however, Mitch’s discovery of Melanie and the eventual uniting of “hero and heroine for the first time in the ironic and ravishing complicity of an exchange” (p. 53) interrupt the return. 

Though Bellour suggests that the second center “stands out less starkly,” it does nonetheless stand out. Shot 43, whose large number of Gini values suggests both its length and the varying data set, is where Melanie moves along the dock and into the boat. Shots 44 and 45 begin the pattern of displacement along the Gini value that was typical in the earlier alternation. This alternation pattern develops strongly between 48 and 54 – alternating Gini values remain almost fixed in place along the Gini axis and they occupy a narrow band of the axis. At 55, the shot crosses the 0.5 boundary and the subsequent Gini values suggest alternation again, though of a more widely distributed sort. It is during this fragment that Melanie has watched Mitch, then, at 54 Mitch runs to the house and at 55 Melanie stands up and tries to start the motor. The second center displays a form of alternation, but this takes place in a manner that presents almost a mirror image of the alternation in the trip out – the alternation here hanging below the .5 line. As the second center closes, the alternation repeats the pattern of the trip out – all the Gini values arcing above .5. 

Closing Thoughts
The order of magnitude difference between the mean differences for key frames and non-key frames presents a numerical representation of the key frame tact. We have a precise, numerical way of speaking of the key frames identified by Bellour, as well as an automated way of detecting those frames. 

The clustering of Gini coefficients in the “on water” sequences with distinctly different and separated patterns presents a numerical representation of the alternation tact. Melanie’s Brenner house sequence presents a distinctly different numerical and graphical representation, giving us the hinge tact. The numerical and graphical “bunching up” in the representation of Mitch’s discovery of Melanie and their double seeing alternation, presents us with the second center and a means for speaking precisely of the two-centers tact.

Bellour does not speak to any significant degree about the gull strike on Melanie, though the strike is often mentioned in other discussions of the Bodega Bay sequence. The entire strike is approximately one second of running time and may have been too microscopic for Bellour to address in his analysis. However, the numerical analysis and graphical presentation present a striking data set. Almost every frame presents a Gini value significantly different from its predecessor. This is a very high entropy portion of the sequence – several rapid changes in the data stream in less than a second of running time is a very different pattern from that of any other portion of the film. We might suggest that digital frame-by-frame precision might have enabled Bellour to speak of this brief fragment.

In some sense, the hardest thing about what we are doing is seeing what is actually computable only from the physically present data. That is, film criticism and analysis have so long depended on human engagement with the physical document that the distinction between the data stream of the document and the contribution of the viewer’s prior knowledge of what is represented remain difficult to tease apart. So we can easily cluster shots with roughly similar RGB patterns. However, going from an MS of Melanie in the boat to an LS of Brenner's house, while it shows us an RGB change does not show us anything that would definitively indicate MS to LS. Also, one could imagine a change from MS to LS (say a cityscape of one or two building fronts to a LS of several buildings) in which the RGB would remain fairly constant. Within any one film or one director's body of work we might be able to make some calculations that would describe/predict CS MS LS changes, but there is just nothing inherent only in the data that makes that a widespread property. This problem does not diminish either Bellour's analysis or the digital analysis – it simply speaks to the complexity of understanding filmic documents and even simply describing them accurately. Indeed, this demonstrates one of our initial assertions: that the engineering of the message structure and the semantic meaning are separate, complementary notions.

That said, the close correlation between the frame-to-frame analysis and Bellour’s writing suggests that our use of an expert analyst’s response to The Birds indeed demonstrates the validity of this approach to numerical and graphical representation of filmic structure. Perhaps one of the most significant consequences of the close correlation is the availability of a vocabulary for description and analysis. A fundamental problem with previous systems of analysis has been the reliance on words to describe visual, time-varying documents. Being able to represent visual attributes and time-varying states of the attributes at the pixel, frame, frame set (“shot”), sequence and document level with the same processes and terms should enable deeper and more fruitful analysis.

At the same time, the techniques provide means for discovering structural elements. It would be too facile to suggest that we now have a robust mechanism for automated description of filmic structure; however, we do at least have a robust automated means for mapping the structure. We could run any film through a frame by frame comparison of RGB and be able to state that certain portions remain stable for some time, then change; and at some points, rapid changes take place – the points of change, the points of discontinuity in the data stream, represent points where something different is happening. 

Perhaps even more intriguing and a likely avenue of rewarding research would be the use of RGB fingerprints in classification. Do all of Hitchcock’s films, or at least those from a particular period, share the same fingerprint patterns? If De Palma is the heir to Hitchcock, do his films actually bear a numerical similarity to Hitchcock’s films? Do music videos and early Russian documentaries (for example, Vertov’s Man with the Movie Camera [24]), films with very different structures from the classic Hollywood films studied by Bellour, yield useful numerical descriptions?

Of course, most moving image documents are made up of more than simply RGB data. Multiple sound tracks for voice, narration, sound effects and music significantly increase the amount of data available for analysis; however, there is no reason that these time-varying data could not be described using a similar numerical and graphical technique.

As we have demonstrated here, the data available for analysis is not limited to the signals available in the physically present document. Bellour’s analysis of The Birds, in essence, becomes another signal or memetic attribute of the document. Other critics who have commented on The Birds or viewer reactions to the piece could be analyzed in the same manner that we have applied to Bellour’s work. Every person who interacts with a document and commits some permanent behavioral product of that interaction contributes to the document’s signal set for subsequent uses.

Considered from our perspective, this contribution becomes a fundamental aspect of the setting for considering the relationship between the document/message structure and the semantic meaning. The additional signal, for example a review, can have a significant impact on whether a document is accessed and on how it is evaluated for fitness to a given information need. The document is not necessarily static with the same impact on any given user; rather, it is an evolutionary process. The concept of document as evolutionary process receives more discussion in Anderson [25] and Wilson [26].

Bellour sought means to explore and represent moving image documents with the precision already applied to verbal documents at the micro and macro levels. He sought means to go beyond what Augst [8] termed the “gratuitousness and arbitrariness of impressionistic criticism.” The digital environment offers the opportunity to do so; to enable speaking directly of the native elements such as the RGB components and their changes across time; and, to paraphrase Godard, to confront vague ideas with precise images.

Resources Mentioned in the Article
[1] Pryluck, C. Sources of meaning in motion pictures and television. (1976). Manchester, NH: Arno Press.

[2] Augst, B. & O'Connor, B. C. (1999). No longer a shot in the dark: Engineering a robust environment for film study. Computers and the Humanities, 33, 345-63.

[3] Bonitzer, P. (1977). Here: The notion of the shot and the subject of cinema. Cahiers du cinéma, 273.
[4] Eisenstein, S. (1969). Film form: Essays in film theory. New York: Harvest Books.
[5] Vertov, D. (1984). Kino-eye: The writings of Dziga Vertov. Berkeley, CA: University of California Press.
[6] LaSalle, M. (2005, October 28). This guy just can't hang up his mask. SFGate. Retrieved April 14, 2009, from

[7] Addiego, W. (2005, October 14). Domino quit modeling for the glamour of guns. SFGate. Retrieved April 14, 2009, from

[8] Augst, B. (1980). Instructor’s course notes on Bellour’s “Les Oiseaux: Analyse d’une sequence.”

[9] Bellour, R. (1969, October). Les Oiseaux: Analyse d'une séquence. Cahiers du cinéma, 216.

[10] Hitchcock, A. (Director). (2000). The Birds [Motion picture]. (The Alfred Hitchcock collection). Universal City, CA: Universal Studios Home Video.

[11] Metz, C., (1974). Film language: A semiotics of the cinema. (M. Taylor, Trans.). Chicago, IL: University of Chicago Press.

[12] Dailianas, A., Allen, R. B., & England, P. (1995, October). Comparison of automatic video segmentation algorithms. Paper presented at SPIE Photonics East’95: Integration Issues in Large Commercial Media Delivery Systems. Retrieved April 14, 2009, from and other locations.

[13] Kearns, J. & O'Connor, B. C. (2004). Dancing with entropy: Form attributes, children, and representation. Journal of Documentation, 60(2), 144-63.

[14] Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press.

[15] Skinner, B. F. (1957). Verbal behavior. New York, NY: Appleton-Century-Crofts.

[16] Catania, A. C. (1998). Learning (4th ed.). Upper Saddle River, N.J.: Prentice Hall.

[17] Wittgenstein, L. (1953). Philosophical investigations. New York, NY: Macmillan.

[18] Day, W. F., & Leigland, S. (1992). Radical behaviorism: Willard Day on psychology and philosophy. Reno, Nev.: Context Press.

[19] Skinner, B. F. (1953). Science and human behavior. New York, NY: Macmillan.

[20] Bellour R., Penly, C. (ed.) (2002). The analysis of film. Bloomington, IN: University of Indiana Press.

[21] Salt, B. (2003). Film style and technology history and analysis. (2nd expanded ed.) London: Starword.

[22] O’Connor, B. (1991). Selecting key frames of moving image documents: A digital environment for analysis and navigation. Microcomputers for Information Management, 8(2).

[23] Kearns, J. (2005, October 7-9). Clownpants in the classroom? Entropy, humor, and distraction in multimedia instructional materials. Paper presented at DOCAM 05, Document Academy, 2005.

[24] Vertov, D. (2002). Man with the movie camera [Chelovek kino-apparatom (1929)] [motion picture]. Chatsworth, CA: Image Entertainment. 

[25] Anderson, R. (2006). Functional ontology modeling: A pragmatic approach to addressing problems concerning the individual and the informing environment. Unpublished doctoral dissertation. University of North Texas.

[26] Wilson, P. (1968). Two kinds of power: An essay in bibliographic control. Berkeley, CA: University of California Press.