Indexing Integrated Manual Sets at Northern Telecom
by Frank Exner, Little Bear
Technical documentation often is poorly indexed. One major documentation problem creating customer dissatisfaction is the difficulty encountered searching for specific information within large documentation sets. Informal customer feedback at Nortel's (Northern Telecom's) switch division indicated that over 43% of complaints about documentation related to difficulties accessing information. While significant improvements have been made in the accuracy, completeness and usability of technical information within the manual sets, customers often have trouble locating specific information in a timely manner. We expected that easier access to information would significantly improve customer satisfaction with our documentation.
Several years ago, we began a project designed to address new approaches to subject indexing for our documentation sets. At the time our project began, several indexing models existed, but none were adequate for accessing information in large technical manual sets.
Applying Indexing Models to Technical Documentation
Each of the existing indexing models -- systems that provide single document access versus those that provide document collection access; permuted indexes; encyclopedia indexes; and single versus multiple back-of-the-book indexes -- has one or more characteristics that make it unsuitable for indexing multiple-book manual sets. For example, indexing systems can be separated into those that provide single document access and those that provide document collection access. The two kinds of systems have different purposes and, therefore, different models. Back-of-the-book (single document) indexes often use a language dictated by the document and offer access to an ordered, detailed analysis of a unified object.
By contrast, collection indexes cannot be developed using languages derived from each constituent document. After all there will always be another document added to the collection, and it may well use a different set of synonyms. Consequently, at least in terms of traditional information systems, access must be guided by a controlled language or string searching. A collection index offers access to whole documents rather than concepts within any one document. Therefore, it uses relatively broad inclusive terms rather than the detailed, discrete terms in a back-of-the-book index.
Traditionally, technical manuals have had back-of-the-book indexes. These may be good or bad, manual or automated, full-subject or permuted. They may be created by writers, assistants or indexers. Among telephone equipment manufacturers, permuted indexes are the norm. It is relatively easy to produce them for large documents, and this is an industry replete with very large documents (for example, one Nortel switch is supported by 120,000 pages of material). The size alone would make a single back-of-the-book index covering all of the pages unusable.
As a policy matter Nortel decided against permuted indexes. Such indexes do not allow synonym control, so users must find information using the words that the author used. There are many competing manufacturers of telephone equipment, often using different terms for the same thing or the same term for different things. And industry standard terminology may be different yet. The ideal subject index would include a given manufacturer's terminology, competitor's terminology and industry standard terminology. Permuted indexes cannot produce this.
Often software documentation is written on a one-book, one-use basis. A programming language requires a description manual; a PC requires an installation manual. But a telephone switch requires many manuals for several audiences which may be produced at different times and released in multi-book sets. Most multi-book sets use the encyclopedia model. But all of the volumes in an encyclopedia are published at the same time. The volumes in a technical support set may be published at many different times. Therefore, some elements of the model work for technical manual sets and some elements don't. For example, the locator form (volume followed by page) works; the tight integration of the index and the rest of the set doesn't work (because of the different publication schedule).
There are single indexes and multiple indexes. Both cover the complete document or document set. Multiple indexes divide different kinds of information into separate lists, operating on the principle that users know what kind of information they need (for example, first-line, author, title, etc.). Single indexes use typographic or other methods to combine the lists without losing usability. Both single and multiple indexes analyze the full contents of a document or collection, acting on the assumption that any user may need access to all of the material indexed.
The users of Nortel's manuals have different needs. They do different jobs at different
times under different circumstances. Even in very small telephone companies where one person
does many jobs, an employee knows what job she or he is doing at any one time. This author's
department writes documentation for four basic audiences:
As a result of these varying user needs, we developed the Ballard-form index and Ballard-form set (named for Dr. Robert M. Ballard, professor of library and information sciences at North Carolina Central University), which divides the informational content by audience.
The desired model was provided by domain-analysis, which allows Nortel's four basic audiences to be seen as separate discourse communities or working groups and given separate indexes. As long as people know what jobs they are doing, and as long as each document is created to meet the needs of one audience, audience-based indexes should work. This was the basis for Nortel's audience study.
In the audience study, we wanted to compare user reactions to a combination of the proposed Ballard-form indexes and traditional back-of-the-book indexes so that all their needs could be met. The working hypothesis was that back-of-the-book indexes should be placed at the end of each volume binder and that Ballard-form (audience-based) indexes, combining volume and page numbers, should be placed at the beginning of the volumes for each audience. Then, if a user knows what volume contains the information of interest, she or he can use a shorter index. If the user only knows what kind of information he wants, the Ballard-form index will lead to volume as well as page.
The pilot survey was based on a test set made up of the documents supporting two products authored at separate sites (Nortel has authoring centers in several cities). The products were chosen to fit the following rules:
Each product's document set addressed all standard audiences (basic maintenance, advanced
maintenance, translation and a miscellaneous category for non-standard documents).
Back-of-the-book indexes were created for each of the documents. These were then used to create audience-based indexes for the four standard audiences. Back-of-the-book indexes tell users where to find information in a book if the reader knows what book contains the desired information. Audience-based indexes pull together the information located in several documents and intended for a single audience into a separate document that tells users the volume number and page of the desired information. Then, if desired, all of the audience-based indexes can be concatenated in one volume. Each audience-based index (called a Ballard-form index) is designed to meet the needs of one user group; the concatenated volume (called the Ballard-form set) provides complete coverage of the manual set's information.
A 12-question survey was developed to measure customer needs and attitudes toward the test indexes. The depth and structure of indexing were special foci. Questions one through seven required scaled or "yes-no" objective answers. Questions eight through twelve were open ended and offered respondents the opportunity to suggest new ideas to us.
Test packages consisting of documents with back-of-the-book indexes, audience-based indexes, the survey and a cover memo were assembled and sent to 59 customers who volunteered to examine them. The sample, clearly far from random, was selected from Regional Bell Operating Companies, independent telephone companies and Bellcore. By the deadline for questionnaire return, we had a 28.8% response rate. The results, however, met the expectations established by customer complaints and other anecdotal evidence. Therefore, the reliability was considered high enough to support a pilot study. See sidebar for an analysis of the data for each question, except the last one, for which general suggestions were invited.
Survey Summary and Proposed Indexing Plan
The indexing-specific effects of different writing centers were minimal, but the need for a common set of naming conventions and a common set of writing tools became very clear. Software identifies each document in a Ballard-form index by printing the name of the volume and the pages within. Therefore, naming conventions and their application become a critical issue. Memory problems caused difficulties in the indexing pilot study. If the study's procedures were the model for the final indexing procedure, system memory would be a major concern. To accommodate the pilot indexing study, documents were stored three or four times in memory.
This very inefficient use of resources was balanced by the need for speed and independence for the index designer. However, efficient use of memory must be a primary design element in the Nortel indexing plan.
Our authoring platform produces indexes from tags inserted in a document's text. (Cross-references get their own tags.) When a tag is created, a menu is opened for the headings and locator information to be entered in the index. Up to six levels of headings are allowed. Creating and populating these tags can most efficiently and effectively be done by a document's author by applying a predefined style sheet. (An automation tool implementing the style sheet is planned for the future.) After the document has been tagged, one command creates an index, which, with some massage, is ready for editing. Since tagging 120,000 pages is a massive effort, the plan proposed that each technical writer place index tags in documents as they are written.
Index editing requires special training. An indexing specialist should examine each document for the following characteristics, among others: adequacy of cross references, depth of indexing, breadth of coverage, proper entry phrasing, audience needs, effective information placement, training of new writers, common look and feel, overall quality
A major question was what kind of document should first be indexed. Our customers indicate that each product should be indexed as it is documented. Indexing then becomes a positive marketing feature of our new product lines.
On the average, a writer can index 10 pages an hour. The cost of entering index tags in modules can be determined by dividing the number of planned pages by 10 and multiplying the result by the average hourly cost.
Index Tagging and Automation
Information in Nortel's documents follows five basic patterns: alphanumerically organized reference manuals (the commands directory, for example); manuals consisting of maintenance procedures (for maintenance technicians); manuals consisting of maintenance procedures and explanatory information (for maintenance experts); manuals detailing the database mechanisms that direct the switch to set up a circuit from the calling telephone number to called number; and miscellaneous documents (manuals created to meet product specific needs). Each of these document patterns presented a different indexing audience and problem.
We decided that the first release of indexed documentation would not index the reference documents because they were already alphanumerically organized. This proved a basic information access tool. Since resources for the indexing project were limited, focus was put on manuals that were referred to in emergencies and those with no access tool.
Pure procedural maintenance documentation presented a particular problem. The audience for these manuals is basic maintenance technicians with widely varying years of experience. They have to work with switches made by several different companies and several models from each manufacturers, all in the same shift. In fact, a technician is likely to work with several software releases of each model. Finally, these technicians have to fix switches that are not working; the technicians are, therefore, under great stress.
To serve this audience's needs, the manuals were designed as a series of specific procedures responding to specified problems. A linguistic analogy would be, "IF problem Q exists, THEN DO procedure Q." A traditional index is a random-access mechanism, allowing the user to restructure the contents according to individual needs. But, in a stimulus-response environment, the number of entry points into text must be limited; entering a procedure from the middle could cause a major problem. As a result, each problem-procedure pair was given entries that allowed easy access to the unit.
Manuals for maintenance experts have all of the procedures available to technicians plus descriptive information about the switch's internal workings and possible problem causes. For this audience, a random-access mechanism is highly desirable. They work on unusual problems and may need to order a manual's contents in unique ways. As a result, detailed indexing is desirable.
Translations manuals, which detail the database mechanisms that direct the switch to set up a circuit from the calling telephone number to called number, have a very complex structure. They are organized by feature (three-way calling, for example) and, within feature, by table. There are descriptions of feature interactions, feature operation and the order in which to fill data tables. Members of the translations audience may need a high-level overview now and a single detail an hour from now. Therefore, like maintenance experts, a very detailed index allowing the user to construct an individual information flow is necessary.
Miscellaneous documents, manuals created to meet product specific needs, vary in indexing requirements, depending on audience needs. Document analysis proceeds by considering the users served and planning the manual index to meet their specific needs.
Converting the above analysis into a set of algorithms was made possible by some unique characteristics of technical manuals. These documents focus on disassembling their information contents into separate pieces, titling each substantial piece and gathering related substantial pieces together under higher level titles. There is a minimum of subject interlocking within an information chunk. Thus, titles can be seen as indicating reasonable subject chunks and used as a basis for indexing. At Nortel, manuals are created by entering the desired information into detailed, audience-based templates. These templates specify title formats, so the level of detail required to specify algorithms is easily available.
Indexing rarely gets the resources that the process requires. Nortel's indexing project was no exception. In an exceptionally tight economic time, this researcher was assigned to find a way to index the full manual set (except reference books) -- a total of 85,000 pages -- with a minimum of additional resources. After the hysterical laughing and crying died down, we sought a solution.
The Nortel indexing project decided to use technical writers to create the indexes as they wrote the manuals. This created several problems: consistency, accuracy, usability, completeness and (most of all) training. Our writers are not trained indexers.
To ease the situation, the previously discussed indexing algorithms were converted into detailed instructions so that the writers could place index tokens into their material quickly. (The publication software creates the final indexes by gathering all of the tokens on demand and ordering their contents.) Applying these instruction sets not only reduced the time demanded of writers, but improved the problems mentioned above.
While writers were busy indexing their documents, management realized that automating the indexing algorithms would further improve and speed the project. Working with Todd Jones, a documentation software and LISP expert at Nortel, the algorithm set was converted into an easy-to-use, menu-driven tool that could be quickly applied by any writer. Since writers in Ontario, Canada, North Carolina and Texas were expected to apply the automation tool and check the resulting indexes, a detailed support and training package, including the following sections, was created:
Indexing theory - This material gave writers an idea of the background behind the algorithms and
how Nortel approaches indexing.
With the indexing system in place, Nortel began its routine index program.
Frank Exner, Little Bear, is senior documentation engineer at NORTEL (Northern Telecom) in
Research Triangle Park, North Carolina.
Northern Telecom Audience Study
Analysis of the Data
For each of the objective questions, this chart includes the possible answers and the number of respondents giving each answer. For the open-ended questions, sample answers are provided.
Responding customers want document indexes very much. This may be because the respondents were those most interested in indexes or it might be because our customer community had been demanding indexes for several years.
Our customers would use back-of-the-book indexes.
Our customers would use audience-based indexes.
Many of our customers would probably support additional information access methods.
Five respondents said both were equally useful. Our customers want both audience-based and individual book indexes.
All of the respondents who want more detail are from the translations audience. Our customers strongly believe that the level of detail represented by the test packs was correct. The translations audience, however, may want additional detail.
The seven respondents who answered that the audience-based indexes should be more detailed are from the translations and maintenance audiences. One respondent from the translations audience answered that the audience-based indexes should be less detailed. Our customers believe that the level of detail represented by the test packs was correct. The translations audience, however, may want additional detail.
Six of the respondents said all documents should be indexed. Two of the respondents said all of
the documents except Data Schema [already ordered alphabetically] should be indexed.