Please tell us what you think of this issue!  Feedback

Bulletin, December/January 2010


Facts and Frameworks in Paul Otlet’s and Julius Otto Kaiser’s Theories of Knowledge Organization

by Thomas M. Dousa

Thomas M. Dousa is a doctoral student at the Graduate School of Library and Information Science, University of Illinois, Champaign-Urbana. His email address is tdousa2<at>illinois.edu.

The late 19th and early 20th centuries witnessed numerous developments in the domain of classificatory and indexing activities known today as knowledge organization (KO). Among the most striking of these was the emergence of the idea that documents could be decomposed not only into smaller bibliographical units (as, for example, a periodical into articles or a book into chapters), but also into yet smaller information units (such as, for example, the concepts or facts discussed in discrete passages within a text) and that, once identified, these information units could be reconfigured in new arrangements that would facilitate their retrieval [1, p. 223; 2, pp. 221–222]. This idea, which I term information analysis, would have a long and influential career in information science (IS) and continues to influence IS theory and practice to this day [see sidebar].

The notion of information analysis may be traced back to two pioneers of KO, Paul Otlet (1868–1944) and Julius Otto Kaiser (1868–1927). The Belgian Otlet occupies a prominent place in the history of IS. A lawyer by training and a bibliographer by vocation, he made a number of fundamental contributions, both theoretical and practical, to the development of IS. Originator of the European documentalist movement, initiator or charter member of a number of organizations devoted to international cooperation in bibliography and (co-)creator of the Universal Decimal Classification (UDC), he developed a comprehensive vision of how information was to be organized and mobilized that prefigured, in many respects, the hypertextual systems that are in vogue today [3]. 

Kaiser, by contrast, holds a somewhat lower profile in the annals of IS. Born in Germany but spending most of his life in Great Britain and the United States, he served as an indexer and bibliographer for various commercial institutions, industrial enterprises and technical libraries. Today he is best known for his method of “systematic indexing,” which is considered to be a forerunner of facet analysis [4]. Although Otlet and Kaiser worked independently of one another, at the turn of the 20th century, each developed a well-articulated account of information analysis that deeply informed his theory of KO. 

In this article, I sketch Otlet’s and Kaiser’s ideas about information analysis and compare the types of knowledge organization systems (KOSs) that they constructed on the basis of these ideas. As we shall see, Otlet and Kaiser held very similar views about the possibility – and desirability – of disaggregating documents into information units and organizing the latter into indexed information files. Both men also agreed on the technological means to implement their information-analytic approach. Despite these points of convergence Otlet and Kaiser envisioned vastly different kinds of contextual frameworks for organizing the informational units that they sought to disengage from their documentary trappings and, as a consequence, they developed contrasting theories of what the scope and inner articulation of a KOS should be. The striking differences between the types of KOSs that Otlet and Kaiser constructed on the basis of their views of information analysis can be explained, in large measure, by reference to differences in their personal philosophies as well as to the different professional cultures within which they operated.

From Knowledge to Documents to Facts: Otlet and Kaiser on Information Analysis
At the root of Otlet’s and Kaiser’s notions of information analysis lay their views of the relationship of documents to knowledge. According to Otlet, knowledge encompasses “[e]verything which we know about objects in the external world or from our own thinking, physical objects (natural or artificial), non-physical objects (laws, thoughts, sentiments)” and its primary elements are “facts” and “ideas” [5, p. 73]. Knowledge, in his view, is recorded in the form of documents, wherein authors embed facts and ideas into conceptual structures that reflect their personal understanding of the phenomena that they are discussing or representing. Otlet held that “every document is an exposition of data, facts, and ideas … more or less well ordered, clearly formulated, strongly stylized” [6, p. 97]. According to Otlet, users of documents seek “recourse to documents in order to extract facts and information from them for the acquisition of knowledge, for study or for scientific research” [7, p. 105]; thus, if the new field of documentation was to support this “documentary method” of research, it would have to offer resources for providing access to the individual facts and ideas ensconced within documents. 

When we turn to Kaiser, we see that a similar story prevails. In his view, documents – or, to use his preferred term, pieces of literature – constitute “descriptive record[s],” set down in language, of what individual human beings “observe” in and “reason out” about the world [8, § 52–53]. Kaiser held that “[t]he subjects of our observing and reasoning are things in general, real or imaginary, and the conditions attaching to them” [8, § 52]: the things he called concretes and their conditions, processes. If knowledge can be reduced to knowledge of concretes and processes, and literature represents an encoding of knowledge into written language, it stands to reason that it must be possible to decompose a document into statements about concretes and processes. This reasoning was precisely the view of Kaiser, who put it thus: “from the standpoint of knowledge literature is confined to the description of concretes and of the conditions attaching to them[;] … for our purposes literature may be analyzed into terms of concretes and terms of processes” [8, § 298]. Because, Kaiser argued, men of affairs are interested in the information contained in documents rather than in the documents themselves, “we must try to dissociate information from literature” and so render it more accessible to its potential users [8, § 83].

In addition to agreeing on the possibility and desirability of information analysis, Otlet and Kaiser held similar views about what it would achieve. For Otlet, the goal of the documentalist – who, in his view, should also be a subject specialist – is to identify information units within the document and create individual records for each one. Once individual information units have been separated from their original bibliographic contexts, they can “be set out in a quite analytical way” into “encyclopedic repertories” – index files in which units of information will be organized in such a way as “to link together materials and elements scattered in all relevant publications” [5, pp. 84, 83]. According to Otlet, such repertories will comprise “inventories of facts, catalogues of ideas, and the nomenclature of systems and of theories” [#55, p. 83]. Just as bibliographic repertories provide access to (information about) documents, so will encyclopedic repertories provide access to information itself. 

Kaiser’s view on the matter was virtually identical. In his view, the role of the indexer in a special library was to analyze documents, decompose them into individual informational units, record those units and organize them “on a uniform plan applicable to all the information incorporated in the index” [8, § 295]. In this way, he wrote, “[w]e take our literature to pieces and re-arrange the pieces systematically so as to answer best our object in view” [8, § 16]. The resultant index, so Kaiser averred, will “give an analytic statement of the information, for it has been cut up into pieces, specific facts or opinions, and rearranged in more suitable form” for use by the clientele for whom the index has been constructed [8, § 297]. Kaiser’s systematic index, no less than Otlet’s encyclopedic repertories, was designed to provide immediate access to information. 

The Card Index System: Technological Substructure for Otlet’s and Kaiser’s Vision of Information Analysis
As we have seen, Otlet and Kaiser conceptualized documents as recorded expressions of knowledge consisting of constellations of informational units comprising facts, ideas and opinions. That is, both men advocated extracting these informational units from documents and reorganizing them into index files in order to make them more accessible to researchers. Their agreement on the theoretical fundamentals of information analysis for KO was reinforced by a shared commitment to the cutting-edge information technology of their time: the card system.

In today’s digitally inflected culture card indexes housed in cabinets are often considered to be an antiquated and obsolete mode of storing information. In the last decade of the 19th century and the opening decades of the 20th century, however, information systems based on “[c]ards of a uniform size, on which standardized data were transcribed, housed physically in card drawers and related furniture and organized conceptually by classification of schemes of various kinds … epitomized a new ‘modernist’ technology” that was considered to have enormous practical advantages over the previously regnant system of recording information in bound ledgers [9, p. 12; 10]. Finding increasing use not only in libraries (where card files were first employed) but also in business offices and government bureau, card index systems were especially prized for the flexibility in filing that they afforded: not only could cards bearing superannuated information be easily removed and ones bearing new information be added, but files could be easily rearranged if needed. Otlet and Kaiser were well aware of these advantages and enthusiastically advocated for the use of the card index model [6, p. 384; 11, §§ 71–72].

The underlying reason for the flexibility of the card index system was its segmentation into a number of physically discrete, modular records – cards. Because, in such a system, “[e]ach card is a unit record representing an item of information” [10, p. 405], cards are an ideal tool for registering the results of information analysis. This technology was, indeed, precisely what Otlet and Kaiser envisioned for the KO systems that they designed. Both men held that, in an information index file, each individual card – or, in some documentary contexts, each sheet of loose-leaf paper (Otlet) – should serve as the bearer of a single unit of information extracted from a document. As if to emphasize the one-to-one correspondence between card and information unit, Otlet termed this methodological tenet the “monographic principle” [3, p. 238]. 

Once the information to be indexed had been entered upon cards in accordance to the monographic principle, it was necessary to organize the index file as a whole. Both Otlet and Kaiser envisioned that each card would be indexed by subject and that different cards containing information about the same subject would be collocated within the index file. Special divisionary, or guide, cards, distinguished from the others by size, shape or color, would mark the place of any given subject entry within the card file and indicate any subdivisions of main entries, as well as give cross-references to related subjects [3, p. 242; 8, §§ 399–416]. Card index files thus had the resources to represent individual pieces of information (by means of subject-indexed unit cards) as well as to indicate the general structure of the card file within which the unit cards were gathered (by means of guide cards) and point out connections between subjects (also by means of guide cards). 

In Otlet’s and Kaiser’s eyes, then, the card system was an ideal mechanism for gathering together information units gleaned from many different documentary sources, organizing them according to their intellectual content and guiding users to cards containing information on the particular subject of their concern. As such, it was a sine qua non for their vision of KO based on information analysis. 

Classified vs. Alphabetical Order: Otlet’s and Kaiser’s Divergent Views of KO Frameworks
Whereas Otlet and Kaiser were in substantial agreement on both the desirability of information analysis and its technological implementation in the form of the card system, they parted company on the question of how index files were to be organized. Both men favored organizing information units by subject, but differed as to the type of KO framework that should govern file sequence: Otlet favored filing according to the classificatory order of the UDC, whereas Kaiser favored filing according to the alphabetical order of the terms used to denote subjects. It is instructive to examine why each thought the way he did on this important point. 

To understand Kaiser’s preference for alphabetical order, it is necessary briefly to consider the main points of his indexing methodology, which may be summarized as follows. The indexer was to identify key terms within the documents he was processing, extract them and assign each term to one of fundamental categories – concretes (i.e., entities), countries and processes (i.e., the action that a thing does or undergoes) [8, § 73, 299–301]. Once terms had been assigned to their respective categories, they could be joined together in one of three basic combinations, which constituted subject “statements” about the information being indexed [8, § 302]: 

Concrete–Process
Country–Process
Concrete–Country–Process 
[var., Country–Concrete–Process]

Within the file, main entry terms would always denote concretes or countries, whereas terms denoting processes would always serve as subdivisions. The main entry terms were to be arranged in alphabetical order, with the same being done for subdivisions under each main entry [8, §§ 389–390]. Although the filing order was alphabetical, Kaiser provided what he called a “logical key” to the index [8, § 389]. He did so by stipulating that the indexer indicate on the guide card for each main entry term a list of the other main entry terms with which it stood in semantic relation: the latter included synonyms, broader terms, narrower terms and related terms [8, § 415, 423]. The “logical key” served as the syndetic structure of the index, indicating a web of conceptual relations otherwise unexpressed by the alphabetical structure of the index file.

Given that Kaiser acknowledged the utility of indicating semantic relations between index terms, why did he prefer alphabetical to classified order for filing? The answer lies in his view of language. Kaiser considered words in natural language to be imprecise expressions of the concepts that they are intended to convey. In addition, he held that there is little agreement among users of a language as to the precise definition of individual terms [8, § 60–61, 112]. Such semantic indeterminism, in his opinion, makes it difficult to determine precisely what the authors of documents mean by the words they are using, and any attempt by an indexer to substitute preferred terms for the author’s own words runs the risk of misinterpreting the meaning of the words in the original document. It was for this reason that he preferred using index terms extracted from the document itself [8, § 114]. 

Kaiser’s perception of the semantic lability of language also led him to distrust classified order because it presupposed universal agreement about the precise definition of words – something that he held to be impossible. The alphabetical approach circumvents this problem by organizing information files by the formal characteristics, rather than the meaning, of subject terms: it thus reduces the possibility of misinterpretation on the part of the indexer [8, §§ 114, 178]. Furthermore, it makes use of an organizing principle that forms part of the common knowledge of indexer and user alike, whereas a classed organization based on the meaning of its index terms would require special knowledge of the principles underlying that classification [8, § 131–132]. For Kaiser, then, alphabetical order represented the interpretatively safest and most user-friendly way of organizing information units by their subject terms. 

Otlet, by contrast, was strongly opposed to organizing information units by the alphabetical order of their index terms. In his view, such a mode of organization “scatters the [subject] matter under rubrics that have been classed arbitrarily in the order of letters and not at all in the order of ideas” and so obscures the conceptual relationships between them [6, p. 380]. By the same token, it removes individual terms from the intellectual contexts that help to determine their meaning, and this isolation makes it difficult for users of an alphabetical index file to “handle the complex expressions that one finds in the modern terminology of discipline[s] such as medicine, technology, the social sciences” [6, p. 381]. A further difficulty, claimed Otlet, is the tendency of individual alphabetical indexes to have their own particular sets of index terms: this “arbitrariness in the choice of words” gave them “a personal character” that hindered standardization and, hence, bibliographical cooperation across different information centers [6, p. 381]. Finally, because the subject terms in an alphabetically arranged system must be expressed in a certain language, such indexes would be of necessity monolingual and so usable only to members of a single language community. This restriction, in Otlet’s view, would be an impediment to international cooperation in the organization of information [6, pp. 380, 381].

Otlet’s preferred framework for organizing information units was the classified approach, as embodied in his UDC. One reason lay in the UDC’s capacity for providing intellectual context for individual subject entries: unlike alphabetical indexes, it assured that “all related subjects are grouped together” and so was likely, in Otlet’s opinion, to be “preferred by scientific men” [12, p. 99]. A more fundamental reason, in Otlet’s view, was that the UDC was comprehensive and systematic in its subject coverage and eminently suitable for use in an international setting. Not only did the decimal classification’s general outline “embrace the universe of knowledge” but its use of a numerically-based decimal notation to designate classes and their subdivisions expressed in a transparent way “the location of each subject, no matter how specific it may be, in the whole corpus of knowledge” [13, pp. 31, 34, 36]. No less important, the numerical notation served to “translate ideas” into “universally understood signs,” namely numbers [13, p. 34]. This feature would permit users from different linguistic communities to use the classification despite the language barriers that might otherwise separate them. For Otlet, then, the UDC provided a comprehensive and systematized codification of knowledge whose notation allowed it to be used on an international scale, and such a system would be necessary if the results of information analysis were to be made available to the broadest range of users. 

Universalism vs. Localism: Otlet’s and Kaiser’s Differing Views on Framework Scope 
For Otlet, one of the major advantages of the UDC over alphabetical systems was that “every alphabetical filing scheme has, through the arbitrariness of the choice of words, a personal character, whereas the [U]DC has an impersonal and universal character” [6, p. 380]. His preference for impersonality and universality in KO must be viewed in light of the larger project of which the UDC was a key element. Otlet’s initial purpose in creating the UDC was to establish a KO framework for a bibliographic card catalog of universal scope, which he dubbed the Universal Bibliographic Repertory (RBU) and developed under the auspices of one of the institutions that he founded, the International Institute of Bibliography (IIB) [3, p. 238]. As he elaborated his idea of information analysis, he developed the idea of a “universal encyclopedia,” which was a series of interrelated, discipline-based card (or loose-leaf sheet) indexes synthesizing the totality of information relating to the different branches of knowledge [5, p. 83-84; 6, p. 409]. These comprehensive collections of facts and ideas were likewise to be indexed and organized in accordance with the UDC and their development was to be overseen by the Office of International Bibliography (OIB), the branch of the IIB responsible for administering the dossiers of information that Otlet was creating. Otlet’s intent, then, was that the UDC would be a classification of universal scope that would serve as an international standard for organizing information.

In stark opposition to Otlet’s insistence that an ideal KOS be impersonal and universal, Kaiser firmly held to the view that, ideally, KOSs should be constructed to meet the needs of the particular organizations for which they are being created. For example, with regard to the use of card indexes in business enterprises, he asserted that “[e]ach business, each office has its individual character and individual requirements, and its individual organization. Its system must do justice to this individual character [11, § 76]. Because different businesses have different information requirements, “each office must devise its system in accordance with its own requirements, and it should itself be the best judge of what these requirements are” [11, § 76]. The inevitable corollary to this was that “it is impossible to devise a system that could be applied universally” [11, § 76]. 

What applied to the business office applied to the library as well. Kaiser deprecated the use of universal classifications in special libraries and information bureau, claiming that prefabricated general schemas for organizing documents or information could not take into account the particular information needs of a given organization [8, §§ 246–247]. Furthermore, he rejected the idea that an index should aspire to comprehensive coverage of a universe of knowledge. Rather, he maintained, it should include index terms only for those subjects that actually pertain to the information needs of the organization for which it was designed [8, §§ 309, 311]. In the same vein he recommended that the relationships between main entry terms established in the syndetic structure of an index should reflect the particular point of view of the organization using it [8, § 425]. Kaiser’s insistence on designing KOSs in accordance with the particular needs of a given organization stands in stark antithesis to Otlet’s belief that such a system should be universal in scope, standardized and sufficiently impersonal to be useful to a large number of users from a host of different backgrounds; this antithesis continues to be a topic of discussion about KO within IS to this day [14].

Conclusion: Worldview and Professional Culture as Determinants of Frameworks for Facts 
We have seen that Otlet and Kaiser believed that documents could be disarticulated into information units (namely, statements of facts and ideas), that these units could be recorded on unit records in the form of cards and that they could be filed in card indexes serving as repositories of information directly accessible to users. However, they held almost diametrically opposed views about the appropriate KO framework for organizing information units: Otlet advocated the “grand narrative” of a universal classification system applicable across different contexts and encompassing the whole universe of human knowledge, while Kaiser supported the creation of numerous, local “micro-narratives” in the form of alphabetical indexes customized to the specific needs of individual institutions. It remains to consider why, despite their agreement on the fundamentals of information analysis, their ideas of what constitutes the ideal type of KOS were so divergent. 

Two factors explain Otlet’s and Kaiser’s different visions of KO: their personal worldviews and the professional cultures that they inhabited. Strongly influenced by the Comtean and Spencerian versions of positivism that he had imbibed in his youth, Otlet was concerned with the progress of the sciences (in the widest sense of the term), believing that a holistic integration of human knowledge into a single, well-articulated system of sciences could form the intellectual basis for the universal amelioration of human life [15, pp. 20, 26–29, 354]. This integrative, universalist conception of human knowledge was of a piece with Otlet’s passionate devotion to the internationalist cause of fostering social unity on a planetary scale and his lifelong commitment to the building of institutions for international cooperation [15]. For him, information analysis provided a mechanism for identifying and isolating those facts and ideas that belonged to the general fund of human knowledge and organizing them in a universally codified fashion that would allow researchers to leverage this information more easily for the benefit of humankind as a whole: in this way, he hoped, the better organization of information produced by documentation would contribute to the progress of humanity. 

In contradistinction to Otlet’s universalist perspective, Kaiser’s worldview was rooted in an ethos of individualist particularism. Holding to the conviction that “[o]ur individuality is our greatest asset” and that “[one] cannot standardize the human intellect,” he took the goal of information analysis to be to create indexes of information units that would provide persons working within a particular organization with access to just those facts and ideas that would be useful to them in their work for that organization [8, §§ 23, 57]. Whatever the intellectual sources of Kaiser’s fervent individualism may have been, his worldview was perfectly consonant with the world of commercial and technical libraries within which he developed and refined his method of systematic indexing, a world that placed a premium in the customization of classification and indexing to the specific needs of the enterprise for which the librarian was working [16, pp. 184].

Ultimately, the seemingly paradoxical differences between Otlet’s and Kaiser’s views of how KOSs based on information analysis should be elaborated must be explained by cultural differences, both personal and professional. Both men lived and worked during an epoch when cultural modernism was at its height [17]. It is unsurprising, then, that the approach to information analysis that they developed is consonant with the modernist impulse towards the systematization, organization and rationalization of cultural processes. Yet if Otlet and Kaiser were both products of a modernist milieu, they also inhabited different worlds within that milieu. Motivated by a universalist vision of benefiting humankind as a whole, Otlet elaborated his KO theory within the world of international organizations and universal bibliography, while Kaiser developed his within the world of commercial and industrial libraries, where pragmatic concerns strictly tied to particular organizational needs were of primary importance. The role of culture in shaping Otlet’s and Kaiser’s differing views on KO frameworks for organizing facts derived from information analysis is well worth keeping in mind for proponents of information-analytic approaches to IS today. 

Author’s Note
I thank the members of the Research Writing Group at the Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign, for their helpful critique of an earlier draft of this article.

Resources Cited in This Article
[1] Metcalfe, J. (1957). Information indexing and subject cataloging. Alphabetical: classified: coordinate: mechanical. New York: Scarecrow Press.

[2] Frarey, C. J. (1953). Developments in subject cataloging. Library Trends, 2(2), 217-235.

[3] Rayward, W.B. (1994). Visions of Xanadu: Paul Otlet (1868-1944) and hypertext. Journal of the American Society for Information Science, 45(4), 237-250. 

[4] Svenonius, E. (1978). Facet definition: A case study. International Classification, 5(3), 131-141.

[5] Otlet, P. (1903). The science of bibliography and documentation. In W. B. Rayward (Ed. & trans.), International organization and dissemination of knowledge: Selected essays of Paul Otlet (pp. 71-86). Amsterdam: Elsevier, 1990.

[6] Otlet, P. (1934). Traité de documentation. Le livre sur le livre: Théorie et pratique. Bruxelles: Editiones Mundaneum.

[7] Otlet, P. (1907). The systematic organization of documentation and the development of the International Institute of Bibliography. In W. B. Rayward (Ed. & trans.), International organization and dissemination of knowledge: Selected essays of Paul Otlet (pp. 105-111). Amsterdam: Elsevier. 

[8] Kaiser, J. (1911). Systematic indexing. London: Isaac Pitman & Sons.

[9] Rayward, W. B. (2008). European modernism and the information society: Introduction. In W. B. Rayward (Ed.), European modernism and the information society: Informing the present, understanding the past (pp.1-12). Aldershot, UK: Ashgate. 

[10] Flanzreich, C. (1993). The Library Bureau and office technology. Libraries and Culture, 28(4), 403-429.

[11] Kaiser, J. (1908). The card system at the office. London: McCorquodale & Co. 

[12] Otlet, P., & Vandeveld, E. (1906). The reform of national bibliographies and their use in universal bibliography. In W. B. Rayward (Ed. & trans.), International organization and dissemination of knowledge: Selected essays of Paul Otlet (pp. 96-104). Amsterdam: Elsevier.

[13] La Fontaine, H., & Otlet, P. (1895–1896). Creation of a universal bibliography: A preliminary note. In W. B. Rayward (Ed. & trans.), International organization and dissemination of knowledge: Selected essays of Paul Otlet (pp. 25-50). Amsterdam: Elsevier. 

[14] Mai, J.-E. (2008, June 10). Small, medium and big IOPs. On his Organizing Stuff [blog]. Retrieved October 11, 2009, from http://organizingstuff.blogspot.com.

[15] Rayward, W. B. (1975). The universe of information: The work of Paul Otlet for documentation and international organization. Moscow: VINITI. 

[16] Black, A. (2007). Enterprise and intelligence. In A. Black, D. Muddiman, & H. Plant, The early information society: Information management in Britain before the computer (pp. 149-185). Aldershot, UK: Ashgate.

[17] Buckland, M. (2008). On the cultural and intellectual context of European documentation in the early twentieth century. In W. B. Rayward (Ed.), European modernism and the information society: Informing the present, understanding the past (pp.45-57). Aldershot, UK: Ashgate. 

The Enduring Legacy of Information Analysis in IS
The idea that documents can be decomposed into smaller information units to which direct access might be provided has been historically influential in IS theory and practice, as the following three examples indicate: 

  • In the first half of the 20th century information analysis served as a theoretical fracture point in the professional cleavage between general librarianship and documentation/special librarianship. Whereas general librarians focused on providing subject access to books by means of subject headings that characterized the contents of the document as a whole, documentalists and special librarians sought to provide intellectual access to specific information within documents through detailed indexing of all pertinent information units that they contained. Information analysis thus became a professional marker of documentation and special librarianship. 
  • In the 1950s and 1960s, information retrieval (IR) theorists drew a distinction between “document retrieval systems” and “fact retrieval systems.” The former, were intended to retrieve, in response to a user’s query, all documents that might contain information pertinent to answering that query, while the latter were to lead the user directly to specific pieces of information – facts – embedded within the documents being searched that would answer his or her question. The idea of information analysis clearly provided the theoretical impetus for fact retrieval (aka question-answering) systems.
  • From the 1990s through today, the extraction of information units from digital documents has become a focal point of research within IS. XML-based markup languages are being used to identify and isolate chunks of information within digital documents, and systems are being designed to retrieve and collocate these pieces of information. Text-mining techniques are being developed to identify, retrieve and collocate information units from digital documents with unstructured text. The idea of information analysis is the basis of such work.