Maintaining Web-based Bibliographies:
a Case Study of Iter, the Bibliography of Renaissance Europe

Tracy Castell
Project Manager Iter, Faculty of Information Studies, University of Toronto, Toronto, Canada


Abstract

This paper introduces Iter [1], a newly formed, non-profit research project dedicated to increasing access to all published materials pertaining to the Renaissance and, eventually, the Middle Ages. Issues related to building and maintaining Iter's first Web-based bibliography, the "Iter Bibliography" are discussed, focusing exclusively on printed secondary materials from the journal literature. Information management issues can be divided into the issues related to the management of the information in the Iter records, and those related to the management of these records in the Iter database. Tools used to help manage and enable access to the information in the Iter records include: Machine Readable Code (MARC) record format; Anglo-American Cataloguing Rules (AACR2R); Library of Congress Subject Headings (LCSH); Abridged Dewey Decimal Classification system (DDC); and automated authority files using a DRA cataloguing system. A special record status code assigned to each record is used to help manage the records in the Iter database. The code, which is used to identify all records from a particular journal at a specific step in the work flow, facilitates the movement of records from one step in the work flow to another. Careful consideration was given regarding the tools selected to help maintain the Iter Bibliography. In the end, the information management decisions made for the Iter Bibliography will have a pronounced affect on the way users query the Bibliography and view search results. Lessons learned will help structure how new bibliographies are added to Iter's portfolio of online services.

INTRODUCTION

There are many humanities bibliographies available today. Some, like the Bibliographie internationale de l'Humanisme et de la Renaissance (BIHR), are still published in their traditional print form. Others, like International Medieval Bibliography (IMB) publish print and CD-ROM versions. Most recently, a "next generation" of bibliographies is being offered to end-users over the World Wide Web. Examples include the Modern Languages Association Bibliography (MLA) [2] and Humanities Abstracts [3], which have developed from their CD-ROM equivalents, and the Bibliography of the History of Art (BHA) [4], which has been released directly to the Internet.

The nature of the World Wide Web makes it an attractive option for publishing new bibliographies. Many, including the case study discussed in this paper, provide access to previously uncontrolled or more obscure literature. Two advantages of Web-based bibliographies are: 1) they can be accessed from anywhere in the world, and 2) they can be updated and maintained on a daily basis. However, as information professionals, we are also concerned with what standards, if any, are being used to collect, process, store, and make information accessible. These issues have implications for the quality of bibliographies as well as the reliability of the information within them.

This paper provides an overview of the information management issues that arose during the initial development of the Iter Bibliography, a service developed especially for the World Wide Web to increase access to information about the Renaissance and, eventually, the Middle Ages. The Iter Bibliography will eventually include many different types of materials, but the focus of this paper is on the information management issues that pertain to creating and maintaining records for materials published in the journal literature.

Following an overview of the Iter project, the goals and objectives of the Iter Bibliography are presented. Next, information management issues related to the management of the information in the records, and the issues related to the management of these records in the database, are discussed. In each case the tools used to help achieve information management goals are reviewed. Finally, the implications of the information management decisions made for the Iter Bibliography are summarized. It is hoped that the information presented will help others learn from Iter's experiences.

WHAT IS ITER?

Iter, meaning 'a journey' or 'a path' in Latin, is a newly formed, non-profit research project with partners in Toronto (the headquarters), New York City, and Tempe, Arizona. The goal of Iter is to increase access to all published materials pertaining to the Renaissance (1300-1700) and, eventually, to the Middle Ages (400-1500), by creating online bibliographies.

Conceived in September 1995, Iter started as a collaboration between the Renaissance Society of America (RSA) and the Centre for Reformation and Renaissance Studies at the University of Toronto (CRRS). Iter has since grown into a major cooperative venture with the addition of three new partners: the Arizona Center for Medieval and Renaissance Studies at Arizona State University in Tempe (ACMRS) in February 1996, and the Faculty of Information Studies at the University of Toronto (FIS) and the University of Toronto Library in November 1996. Policy for Iter is set by an Executive Committee composed of the project Director and one representative from each of the partners. While the Director oversees Iter, day-to-day operations are the responsibility of the Project Manager.

In addition to the support from the partners and their host institutions, Iter has received a grant of $125,000 (US) from the Andrew W. Mellon Foundation and $15,000 (US) from the Gladys Krieble Delmas Foundation. This, and other in-kind and research funding, is expected to help support the project during its formative years. At the time of writing, Iter is preparing to market and sell individual subscriptions and site licenses to access its online services. The money collected will help cover on-going costs and allow re-investment into the project.

THE ITER BIBLIOGRAPHY

The Iter Bibliography, Iter's first searchable Web-based service, presently contains more than 60,000 records based on secondary material published in over 180 journals from 1700 to the present. The bibliography will gradually expand to provide full descriptions, with subject access, to materials from entire runs of journals in many languages. Accordingly, it will include articles, notes, review articles, catalogues, book reviews, editions, and bibliographies. It will also have records for books, conference proceedings, collections of articles, dissertations, artworks, and music.

The Iter Bibliography has a distinctive interface that is designed to meet the needs of humanities scholars. Users will be able to search by keyword, browse alphabetical indexes, and search by discipline based on the Dewey Decimal Classification system. Subject access is provided by Library of Congress subject headings. More refined searching is possible by adding multiple search terms using Boolean operators, and in the near future, specialized limits will be available for type of work, language, modern region, and time period. In addition, subsequent searches can be initiated from links on the full record results page. Through simple requests, users can also review their searching history and get a list of records they have selected for downloading.

The University of Toronto Library has taken on Iter, beginning with the Iter Bibliography, as a case study for maintaining digital collections. In the near future, other Web-based services will be added to the Iter collection, such as databases of Renaissance and Middle Ages scholars, research projects, organizations, and other online resources.

GOALS AND OBJECTIVES OF THE ITER BIBLIOGRAPHY

From the outset of Iter, the Director has set high expectations for the Iter Bibliography. The goal is to develop an authoritative, reliable, high-quality Web-based service for humanities scholars.

To help achieve this goal, three objectives were defined. First, employ international record and cataloguing standards. Second, provide comprehensive coverage of print and electronic publications. Third, involve graduate students in Renaissance Studies and Information Studies (Library and Information Science) in the creation and maintenance of the records.

INFORMATION MANAGEMENT ISSUES

Collectively, the objectives set out for the Iter Bibliography have shaped the way information is managed. The first objective has determined what kinds of information is stored in the records, and the way in which this information is stored. The second and third objectives have affected how the records themselves are managed within the database. The immediate concern of this paper is to describe the issues pertinent to working with records published in the journal literature.

Managing the Information in the Records

To meet the Iter Bibliography's first objective, internationally accepted record and cataloguing standards were utilized to help create records for the Bibliography and provide subject access to these records. International standards are important because the Bibliography has a potential world-wide audience.

The next sections outline five standards that Iter is using to help manage the information that resides in the records in the Iter Bibliography. Since the standards were originally developed for monographs, special modifications were sometimes required to adapt them for articles and similar materials. The tools used to help apply the standards are also discussed.

Machine Readable Code Record Format: The Machine Readable Code record format (MARC) was chosen as the standard for the Iter Bibliography for several reasons. MARC is an international standard and is the format most used for academic catalogues. This was an important consideration vis à vis integrating the materials currently in the Iter Bibliography with other materials, such as monographs and music, in the future. Secondly, the MARC format, which already accommodates many different types of materials, could be modified for journal articles and similar materials for which there is no current standard. For the Iter Bibliography, slight modifications were made to the MARC record format for monographs to accommodate specific information. For example, the source for a journal article (i.e., serial name, numbering information, etc.) is entered into the 440 field which is normally used for the monograph series statement. Finally, the MARC record has a provision for local notes (590 field) which are used to record special information for individual items. Notes are also used to facilitate communication between data inputters and cataloguers, and to store descriptive cataloguing information that does not otherwise have a place in the MARC record. The 590 field cannot be viewed by users. Conversion programs were used to help create USMARC records for the Iter Bibliography. The programs were written especially for Iter using Data Magician software [5]. They convert bibliographic information in text files to a tagged USMARC format according to the USMARC Format for Bibliographic Data, 1994 Edition.

Anglo-American Cataloguing Rules: The Iter Bibliography achieves the descriptive cataloguing standards set out by the Anglo-American Cataloguing Rules, Second Edition, Revised 1988, 1993 Amendments (AACR2R). To help meet this standard, special templates were designed, along with a set of guidelines for data inputters, to assist in the collection of bibliographic information. The templates and guidelines ensure that all the necessary descriptive information for a particular work is entered into a temporary database. Where possible, a single template is used for groups of like materials. AACR2R is also used for verifying descriptive cataloguing and establishing name authorities for the Iter Bibliography.

Library of Congress Subject Headings: Subject access to the materials in the Iter Bibliography is available through Library of Congress Subject Headings, 19th Edition (LCSH). However, changes in the way subject headings are assigned were necessary to meet the needs of a historical bibliography and its users. For instance, chronological and time period subdivisions, that are normally made as part of the subject heading, are not used by Iter cataloguers. Instead, time periods associated with the item described in the bibliographic record are entered into the Time Period of Content (045) field. This makes it possible for cataloguers to describe time period at the exact level of specificity for each item.

For example, the time period for a subject in an article spanning 1529 to 1545 is represented once by: 045 $b d1529 $b d1545, instead of coding $x History $y 16th century subfields for every applicable subject heading. The 045 field can also take the place of two or more chronological subdivisions to express a lengthy time period. For an article spanning 1455 to 1650 for example, the subfields added to every applicable subject heading, (i.e., $x History $y 15th century, $x History $y 16th century, and $x History $y 17th century), are represented once by: 045 $b d1455 $b d1650. This methodology not only increases the value added to the record in terms of date searching flexibility, but also dramatically reduces the work involved in assigning LCSH. In instance s where only a narrower time period is appropriate for a particular subject heading, the appropriate chronological subdivision is added to this heading in addition to coding the time period for the whole item in the 045 MARC field.

As will be discussed in more detail under Managing the Records in the Database, graduate research assistants in Renaissance Studies (hereafter 'Subject Specialists') describe the contents of the materials added to the Iter Bibliography. To assist them in their subject analysis work, a condensed list of free-floating subdivisions (based on LCSH) was produced. This manual lists the most common aspects that can be brought out for general topics, persons, institutions, places, literatures, etc. It is accompanied by an instructional guide and data entry scheme to help ensure the consistency of recorded information.

Graduate research assistants in Information Studies (hereafter 'Information Specialists') use several tools to help them assign appropriate subject headings according to LCSH. Using the subject analysis report created by the Subject Specialists as a guide, they search for appropriate subject headings using Classification Plus on CD-ROM, and the Library of Congress Subject Cataloguing Manual: Subject Headings, 5th Edition.

Dewey Decimal Classification System: To provide an alternative form of searching for materials in the Iter Bibliography, a search by discipline based on the Abridged Dewey Decimal Classification, Edition 12 (DDC), is provided. Modifications have been made in its application to coincide with other cataloguing policy decisions. Because the aspects of time period and geography are represented elsewhere in the MARC record, and because they can be made part of the search statement as limits, there is no need to repeat this information in the classification notation. Time period is described in a Time Period of Content (045) MARC record field. Geography is brought out using LCSH and as a MARC geographic code (043 field). As a result, most notations are built to one or two numbers past the decimal.

Since materials do not need to be physically located (on a shelf for instance), they do not need to be uniquely identified and more than one Abridged DDC notation can be assigned if applicable. This allows the user to access interdisciplinary materials, and materials with more than one perspective, from various starting points. For example, an article which discusses three unrelated topics regarding Michelangelo in the early 16th century [6] is assigned three Abridged DDC notations co rresponding to each of the subject areas:

Additionally, it is not necessary to choose between options set out the in the DDC schedules. An article describing various books published about Spain and Portugal in the Renaissance [7] is assigned two notations:

Normally, a cataloguing agency is forced to decide between keeping bibliographies together or dividing bibliographies by subject and keeping all works by discipline together.

Library of Congress Authority File: To help create a consistent bibliography, Iter is keeping its own automated authority file for the Iter Bibliography. Whenever possible, name and subject authorities are copied from the Library of Congress authority files to which Iter has a direct connection via the University of Toronto Library. Series authorities for each of the journals included in the Iter Bibliography are based on the Library of Congress MARC bibliographic record.

Managing the Records in the Database

The Information Technology Services group (ITS) of the University of Toronto Library maintains the Iter Bibliography as a separate database on the library's DRA system [8]. Library staff are responsible for maintaining the computer system and search engine for the Iter Bibliography, and for uploading and indexing records in the database. They also support OCLC's WebZ interface [9] which is used to query the Bibliography and display search results.

To fulfill the Iter Bibliography's second and third objectives, i.e., to provide comprehensive coverage of print and electronic publications, and to involve both Renaissance Studies and Information Studies graduate students in the creation and maintenance of the records, a special methodology was developed for managing the records in the database.

First, two work flows were designed (Figure 1). Stream 1 secures comprehensive coverage of materials added to the Bibliography. All materials from selected journals that were published before the end of December 1996 follow this work flow; however, they can only be partially catalogued initially (Stream 1a). As resources become available, subject access to the materials is also provided (Stream 1b).

Figure 1. Iter Bibliography work flows

At the same time, Stream 2 makes it possible to add fully catalogued materials published since January 1997 to the database. The combination of the Stream 1 and Stream 2 work flows makes it possible to add all materials published in a journal since its inception to the Bibliography, and to provide subject access to these materials beginning with the most recent literature.

The work required to process records for each of the work flows was divided between Subject and Information Specialists into a series of steps (Figures 2 and 3). This methodology takes advantage of the subject expertise of graduate research assistants in Renaissance Studies, plus the technical expertise of graduate research assistants in Information Studies. Additionally, the Project Manager has increased flexibility when scheduling record processing for individual journals.

Steps in the Stream 1 Work Flow: Stream 1 is divided into five steps (Figure 2). First, Subject Specialists (SS) use their knowledge of the subject area to select appropriate materials for inclusion in the Iter Bibliography. Under the supervision of faculty and staff at the Renaissance centres, they enter bibliographic data for each item into a temporary database (Step 1). Next, the temporary database records are converted from a text format to a USMARC format at the Faculty of Information Studies, and sent to the University of Toronto Library to be uploaded to the permanent database (Step 2). Third, Information Specialists (IS), also under the supervision of faculty and staff, verify descriptive cataloguing and establish name authorities for each record (Step 3). At this point, users have access to authoritative bibliographic records with full descriptive cataloguing.

At a later date, when resources are available, subject access is provided to materials beginning with selected core journals and the most recent materials. Subject Specialists return to the original items and provide subject analysis for each item (Step 4). Subject analysis includes a descriptive sentence summarizing what the work is about, the time period for the subject of the article, and a list of topics that constitute at least 30% of the work. The result is a subject analysis report which is passed on to the Information Specialists to assist them with subject cataloguing (Step 5).

Figure 2. Stream 1 work flow for materials published up to the end of December 1996

Steps in the Stream 2 Work Flow: Stream 2 (Figure 3) is a condensed version of Stream 1. Steps 1 and 4, and Steps 3 and 5, of the Stream 1 work flow have been combined so that Subject Specialists and Information Specialists do their respective work in a single step. First, Subject Specialists select materials for inclusion, enter bibliographic information, and provide subject analysis for each item (a combination of Steps 1 and 4 in Figure 2). For materials in the Stream 2 work flow, subject analysis is entered directly into the temporary database record, but cannot be viewed by users. As in Stream 1, the records are then converted and uploaded to the permanent database residing at the University of Toronto Library (Step 2). Finally, Information Specialists verify descriptive cataloguing and complete subject cataloguing for each item, then establish name and subject authorities for the Iter Bibliography (a combination of Steps 3 and 5 in Figure 2). The result is fully-catalogued records that can be added to the Bibliography shortly after the journal issue is published.

Figure 3. Stream 2 work flow for materials published beginning in January 1997

Tools Used to Help Manage the Work Flows: To facilitate working with records from different journals and with records at different steps in the work flow efficiently, a record status code was developed especially for the Iter Bibliography. Consider, as an example, the record status code "sub1ok00344338". The first part of the code (sub1ok) indicates that the first part of Subject Specialist work (Step 1) is complete and, following conversion, that the item is "ok" to move on to the next step in the work flow. The second part of the record status code is the International Standard Serial Number (ISSN) for the source of the item (in this case 0034-4338 for Renaissance Quarterly), or a mnemonic code if no ISSN is available (e.g., 'revhisp' for Revue Hispanique). A complete list of record status codes used to help manage the work flow, and consequently the records in the database, is shown in Table 1.

Table 1. Record status codes used to manage the work flow for the Iter Bibliography

Record Status Code Description
sub1ok<ISSN> For records that have been converted to USMARC and uploaded to the permanent database (bibliographic information only)
sub2ok<ISSN> For records that have been converted to USMARC and uploaded to the permanent database (bibliographic information and subject analysis information)
info1ok<ISSN> For records with verified descriptive cataloguing and completed name authorities
info1help<ISSN> For records which cannot be completed (descriptive cataloguing or name authority problems)
info2ok<ISSN> For records with descriptive and subject cataloguing, as well as name and subject authorities
info2help<ISSN> For records which cannot be completed (descriptive/subject cataloguing, or name/subject authority problems)

The following example illustrates the use of several record status codes for managing the Stream 1 work flow for Renaissance Quarterly. Once records are converted and uploaded to the permanent database (Step 2), the Information Specialist assigned to Step 3 work can query the database for all records with the record status code "sub1ok00344338", and retrieve a list of appropriate records. If at any time the Information Specialist cannot complete descriptive cataloguing for a record for any reason, then the record status code can be temporarily changed to "info1help00344338" until the problem is resolved. Once the problem is resolved, the record status code is changed to "info1ok00344338", indicating that the record is now ready for Step 4 work.

IMPLICATIONS OF INFORMATION MANAGEMENT DECISIONS

The implications of the information management decisions made for the Iter Bibliography directly affect the way users query the Bibliography and view search results. Modifications made to the cataloguing standards adapted for the Iter Bibliography have had implications for the management of the information that resides in the records. For example, modifications made to the USMARC record format (that allow Iter to code journal articles) have determined which fields are indexed. Consequently, this has an impact on the kinds of searches that are possible and how information is labeled in the results pages. Because the Iter Bibliography will include many types of materials, the field labels in the full record display have to be appropriate for all materials.

Changes to the way LCSH and DDC notations are made for materials in the Iter Bibliography also affect the way users search the Bibliography. Users must be made aware of the separate limit by time period that has been created to provide a more flexible and specific query for dates. They must also be familiarized with the search by discipline, which allows users to scan the literature from a number of different angles, and informed that a more refined search can be created by adding specific date or geographic limits to results from a DDC hierarchy search.

Regarding the management of records in the database, dividing the work required to process records for the Iter Bibliography into a series of steps means that records in the Bibliography are at various stages of completion. This also has implications for the way in which users search the Bibliography. For example, until everything in the Bibliography has been subject catalogued, searching by subject or subject keyword, limiting by language, modern region or time period, and searching by discipline, will only retrieve items that have been subject catalogued. The challenge is to convey this information to the users so that they do not unknowingly restrict their search to only those items that have been subject catalogued.

Finally, the collaboration between Subject and Information Specialists in subject cataloguing is expected to lead users to more promising and relevant materials. The Subject Specialists bring their extensive knowledge of the subject area plus an understanding of many languages. The Information Specialists bring their professional training with international cataloguing standards. Working together, they create a superior product for end-users. This methodology relies on proper training in subject analysis and cataloguing to reflect Iter's policies, as well as efficient lines of communication between the Subject and Information Specialists.

CONCLUSION

This paper provides an overview of the information management issues that Iter encountered during the initial development of its first searchable Web-based service, the Iter Bibliography. The paper presents the issues associated with the management of the information in the Iter records, and those related to the management of these records in the Iter database. Iter has made a number of information management decisions that have implications for its users in addition to its developers. These policies have helped Iter achieve its objectives and, ultimately, fulfill its goal of becoming an authoritative, reliable, high-quality Web-based service for humanities scholars.

In the near future, Iter will face new information management challenges as it enhances the Iter Bibliography and adds new Web-based services to its collection. These issues will require policy decisions, which will in turn have implications for those who maintain Iter, and for users who subscribe to the service. Anticipated questions include:

Iter looks forward to the challenges it faces in the months and years to come. Feel free to monitor our progress and make any comments by visiting our Web site and guest database.

NOTES

1 The Iter Bibliography may be found on the Internet at: Iter: The bibliography of Renaissance Europe (1300-1700) [WWW document]. URL http://www.library.utoronto.ca/iter/.

2 Further information about the MLA International Bibliography is available from: SilverPlatter products available over the Internet [WWW document]. URL http://www.silverplatter.com/erlrmote.htm (accessed June 9, 1997).

3 Further information about the Wilson Humanities Abstracts and Humanities Index is available from: H. W. Wilson Indexes: Humanities Index [WWW document]. URL http://www.hwwilson.com/humani.html (accessed June 11, 1997).

4 Further information about the Bibliography of the History of Art is available from: The Research Libraries Group: Bibliography of the history of art [WWW document]. URL http://www.rlg.org/cit-bha.html (accessed June 9,1997).

5 Data Magician, version 1.4 was used. Further information about this software is available from: Folland Software Services, 6 Chartwell Crescent, Guelph, Ontario, Canada, N1G 2T7, lfolland@ca.dynix.com.

6 Example from Wallace, W.E. (1994). Miscellanea curiositae Michelangelae: A steep tariff, a half dozen horses, and yards of taffeta. Renaissance Quarterly, 47, 330-350.

7 Example from Sider, S. (1994). Getting past 1492: The Renaissance in recent Portuguese and Spanish publications. Renaissance Quarterly, 47, 141-148.

8 Further information about Data Research Associates (DRA) products and services may be found on the Internet at: DRA "Web Page" [WWW document]. URL http://www.dra.com (accessed June 9, 1997).

9 Further information about OCLC's WebZ interface may be found on the Internet at: OCLC Online Computer Library Center, Inc. Home Page [WWW document]. URL http://www.oclc.org (accessed June 9,1997).

REFERENCES

Dewey, M. (1990). Abridged Dewey decimal classification and relative index (12th ed., J.P. Comaromi, J. Beall, W.E. Matthews, Jr., & G.R. New , Eds.). Albany, NY: Forest Press.

Fédération internationale des sociétés et instituts pour l'étude de la Renaissance. (1965-). Bibliographie internationale de l'humanisme et de la Renaissance. Geneva: Librarie Droz.

Gorman, M. & Winkler, P.W. (Eds.). (1988). Anglo-American cataloguing rules (2nd ed., 1988 revision, with 1993 amendments). Ottawa: Canadian Library Association.

Library of Congress. Cataloging Distribution Service. (1996-). Classification Plus [CD-Rom]. Washington, DC : Author.

Library of Congress. Cataloging Policy and Support Office (1996). Library of Congress subject headings (19th ed., Vol. 1-4). Washington D.C.: Library of Congress, Cataloging Distribution Service.

Library of Congress. Cataloging Policy and Support Office. (1996). Subject cataloging manual: Subject headings (5th ed., Vols. 1-4). Washington, D.C.: Library of Congress, Cataloging Distribution Service.

Library of Congress. Network Development and MARC Standards Office. (1994). USMARC format for bibliographic data: Including guidelines for content designation (Vols. 1-2). Washington D.C.: Library of Congress, Cataloging Distribution Service.

University of Leeds. International Medieval Institute. (1995-). International medieval bibliography [CD-ROM]. Leeds, U.K.: Author; Turnhout, Belgium: Brepols Publishers.