The results of a follow-up study (Turner 1995) indicate that there is a high degree of correspondence between the terms participants gave when asked to supply words and phrases that could be used for later retri eval of the shots they saw on the research tapes and terms assigned to those same shots by professional indexers. In addition, the study found that the most popular terms supplied by users also appear in the running descriptions for these shots almost all the time. These findings strongly suggest that the process of indexing moving images at the shot level can be successfully automated, using the textual metadata as the source for generating the index to the shots. Several approaches to this are possible, and studies to explore some of the possibilities have been planned (Turner 1996).
In order to consolidate the theoretical basis for this ongoing work, three studies are presently underway. The first is a replication of the original study with an important difference: the data for the original study were collected from English-speake rs, while those collected for the study reported here are collected from French-speakers. The two other studies in this series will be undertaken using a new sample from the same database to collect data in English in one case, and in French in the other. In addition, the new sample will be completely random, whereas the shots used in the original research tapes constituted a selection of 50 shots based on agreement on the subject matter of the material from a random sample of 200 shots. The goal of the s tudy reported here is to determine to what extent there is a correspondence between the data in English and French in terms of language and culture in the naming activity which formed the basis of the task of the participants. The long-term purpose of the study is to gather empirical evidence to form the theoretical basis for building bilingual and multilingual information systems for moving-image materials.
The study is based on previous work, and a detailed description of the methodology used is given elsewhere (Turner 1994). This section discusses changes required to accommodate the particularities of the present study. These involve producing French-language versions of the data collection materials, which can be classed in two categories, printed materials and the research tapes. Changes made to the printed materials were straightforward, in that copies of the electronic files used to create the English-language forms were made, and the English text replaced by a French translation. In this way, the same appearance was maintained throughout. The printed materials include an information questionnaire used to cat egorize the participants, an instruction sheet explaining the task to be performed, a consent form each participant was required to sign, and the response sheets used for recording the terms each participant wished to associate with each shot on the resea rch tapes.
Verbal information on the research tapes is in the form of text and a voiceover. None of the shots on the tapes has sound, but each shot is preceded by an announcement of its identifying number, and followed by a repetition of this number and the indic ation to participants that it is time to inscribe the terms they wish to assign to the shot on the response sheet. This information appears in written form on the screen and is accompanied by a voice reading the text, in order to reinforce it and to alert participants who may still be writing their responses that the next shot is about to be shown. Most of the participants providing French-language data were expected to be native French speakers. However, since most of these could be expected to understan d English as well because of the environment in which they would be recruited (i.e. the greater Montreal area), it was reasonable to assume that the original research tapes could be used. Nevertheless, the possibility of introducing interference in the da ta collection process because of the constant cognitive processing required on the part of participants, and the accompanying possibility of confusion in the data were worrisome. Thus, in order to remove these obstacles which might influence the quality o f the data, a French-language version of the original research tapes was prepared.
Better equipment was available for generating the text component that appeared on screen and for the audio inserts than had been available for the original study, so that there is some improvement in the French-language version in the form of easier-to -read titles and digitized voice track. Since the same titles and voice-over inserts are repeated throughout the tape (the only change between them being the shot numbers as the tape rolls), participants can quickly relegate them to background consciousne ss; thus it is not thought that the improved quality of these components in the French-language version of the tapes has any influence on the quality of the data collected. What is significant is the preparation time of the tapes, considerably reduced fro m that of the original study. All aspects of the shots used on each of the two research tapes (identity, order, playing time, etc.) are identical to those of the tapes used in the original study. As with the tapes used in the 1994 study, both tapes contai n the same shots, a mix of still and moving images, shown in the same order. The difference between the two tapes is that the shots in moving form on one tape are in still form on the other.
Overall, the patterns in the data collected from French speakers seem very similar to those in the data gathered from English speakers in the 1994 study. Table 1 compares results for the shots classified as simple (i.e. with very few significant object s or events to name in the picture), in still and moving formats, as compiled from data collected from both English-speaking and French-speaking students classified in the "nonvisual" category (i.e. who are not registered in a program in which the focus i s on construction or analysis of images). In the table heads, "NS" means "Nonvisual students", "SS" means "Simple still shots", "SM" means "Simple moving shots", "T1" means "Research Tape 1", and "T2" means "Research Tape 2". For each shot, "Term" represe nts the term which achieved the highest score among the data collected from these participants (i.e. the term which was named the most often), "Share" represents the share of the total number of occurrences of terms provided for the shot by the participan ts in the category that the top-scoring term achieved, represented as a percentage, and "Named" represents the percentage of participants in the category who named the top-scoring term. Since participants were permitted to name up to five terms for each s hot, once a stem is recorded as a score for any given participant, any further occurrences of that stem in the participantās responses to the same shot are disregarded in the calculation of the metrics reported here.
| Shot number | NS/SS/T1 N=25 English-speaking participants | NS/SS/T1 N=29 French-speaking participants | NS/SM/T2 N =19 English-speaking participants | NS/SM/T2 N=24 French-speaking participants |
|---|
| 01 01 01 | Term Share Named | flag 59% 96% | drapeau 57% 100% | flag 36% 63% | drapeau 57% 100% |
| 07 07 07 | Term Share Named | chicken 41% 92% | poule 36% 90% | chicken 32% 74% | poule 36% 100% |
| 13 13 13 | Term Share Named | forest/mountain/train 17% 28% | montagne 18% 41% | train 18% 37% | train 33% 46% |
| 15 15 15 | Term Share Named | bird 36% 64% | oiseau 36% 69% | bird 41% 79% | envol 22% 46% |
| 16 16 16 | Term Share Named | train 38% 72% | train 38% 90% | train 38% 63% | train 49% 92% |
| 18 18 18 | Term Share Named | geography/lake 12% 20% | lac 18% 41% | lake 18% 37% | vue 24% 50% |
| 24 24 24 | Term Share Named | ship 26% 52% | bateau 22% 59% | ship 19% 42% | bateau/traversier 15% 38% |
| 26 26 26 | Term Share Named | woodpecker 40% 76% | oiseau 32% 69% | woodpecker 31% 53% | oiseau 33% 75% |
| 28 28 28 | Term Share Named | building 37% 64% | difice 16% 34% | building 25% 47% | difice 21% 42% |
| 33 33 33 | Term Share Named | moose 32% 56% | fort 18% 34% | moose 31% 58% | orignal 37% 88% |
| 35 35 35 | Term Share Named | sun 63% 96% | coucher 33% 76% | sun 37% 84% | coucher 33% 83% |
Similarly, shot 35 (a sunrise) is interesting because firstly it is not clear to most participants whether it is a sunrise or a sunset. In the English-language study, responses were divided equally between these two concepts, and it is noteworthy that in a practical application perhaps the shot could be used to represent either. In the data thus far collected, French-speakers exhibit the same ambiguity, most participants either giving the equivalent of both "sunrise" ["lever du soleil"] and "sunset" [" coucher du soleil"] or expressing in some other way that they are not sure which it is. Secondly, since the concepts as expressed in French are inverted in relation to the English expressions, both of the latter starting with "sun", the seemingly disparat e results are actually very similar. Thus, for those who saw the still version of the shot "coucher du soleil" ["sunset"], stemmed to "coucher" is the top scoring term, "soleil" ["sun"] is the first runner-up, and "lever du soleil" ["sunrise"] is the seco nd stemmed to "coucher" is the top scoring term, "paysage" ["landscape"] is the first runner-up, and "soleil" ["sun"] and "lever du soleil" ["sunrise"] are tied for second runner-up.
Among the French-speaking participants in this category, shot number 33 shows an interesting pattern. "Forêt" ["forest"] is the top term in the still version, and "orignal" ["moose"] in the moving version. This is likely due to the fact that the moose is rather camouflaged by the foliage in the image, and the motion cues in the moving version of the shot helped participants distinguish the presence of the moose.
The Collins-Robert French-English, English-French Dictionary (Collins 1993), a widely-used and respected bilingual dictionary is used as a basis of comparison to determine whether there is equivalence betw een the terms given in each language for each shot. If the term in one language appears as a possible translation of the term in the other language, then the terms are deemed to be equivalent. On this basis, there is a direct correspondance between the tw o languages among the top terms in at least one version of the shot for ten of the eleven shots in the category reported here.
The remaining shot, number 26, reflects a more general indexing level among the French-speaking participants, the top term among whom was "oiseau" ["bird"], whereas with the English-speaking participants the more specific "woodpecker" scored highest. I t is noteworthy that among the French-speaking participants, "pic" ["woodpecker"] was the first runner-up. Possibly when all the data for the study is analyzed the overall results will absorb these differences in scores in a single category of participant s.
In both languages in the data set reported here, a few shots have terms tied for the position of top-scoring term. In English, shot number 13 has a three-way tie and shot number 18 a two-way tie. In French, shot number 24 has a two-way tie. Again, lit tle significance should be attached to this until all the data is compiled and analysed.
A cursory glance at the percentages given in table 1 indicates a great deal of similarity, in the majority of cases, among the percentages calculated for the top terms supplied by English-speaking participants and those supplied by French-speaking part icipants for the corresponding shots screened in the same conditions. Although the calculations are preliminary and may change somewhat when all the data collected have been analysed, it is likely that the patterns which occur here will prevail.
If the results reported here are indeed representative of those of the completed study, then for the type of material that is the object of this research, namely everyday non-art images in still and moving form, the transfer of visual representations o f objects and events to verbal expressions of these objects and events takes place in the same way for French-speakers as it does for English-speakers. This suggests that shot-level indexing of such material in either of these two languages could be trans fered to the other language using automated techniques to filter the indexing terms through a bilingual controlled vocabulary database. In the context of the overall research agenda and in light of the results of the studies completed so far, it seems cle ar that the development of automated techniques for subject indexing of moving-image production materials rests on solid foundations.
Collins-Robert French-English, English-French dictionary (1993). Beryl T. Atkins et al., eds. 3d. ed. Paris: Harper Collins Publishers and Dictionnaires Le Robert. (Go back)
Furnas, G.W., T.K. Landauer, L.M. Gomez, and S.T. Dumais (1987). The vocabulary problem in human-system communication. Communications of the ACM 30, no. 11 (November):964-71. (Go back)
Furnas, G.W., T.K. Landauer, L.M. Gomez, and S.T. Dumais (1983). Statistical semantics: analysis of the potential performance of key-word information systems. The Bell System Technical Journal 62, no. 6 (July - Augus t): 1753-1806. (Go back)
O'Conner, Brian C (1996). Pictures, aboutness, and user-generated descriptors. The SIG VIS News 1, no. 2 (spring) (Go back)
http://www.unt.edu/~aag0001/oconnor.html
Turner, James M. (1996). Storage and retrieval of moving images: a research agenda. Annual conference of the Association for the Study of Canadian Radio and Television, Brock University, Saint Catharines, ON, 1996 05 28. (Go back)
http://tornade.ere.umontreal.ca/~turner/ASCRT96.html
Turner, James M. (1995). Comparing user-assigned terms with indexer-assigned terms for storage and retrieval of moving images: research results. Proceedings of the 58th ASIS Annual Meeting, Chicago, Illinois, October 9-12, 19 95, vol. 32, 9-12. (Go back)
Turner, James Ian Marc (1994). Determining the subject content of still and moving image documents for storage and retrieval: an experimental investigation. PhD thesis, University of Toronto. (Go back)