Toward a Unified Docuverse:
Standardizing Document Markup and Access Without Procrustean Bargains

Nancy M. Ide
Vassar College, Poughkeepsie, New York

C. M. Sperberg-McQueen
University of Illinois at Chicago, Chicago, Illinois

Abstract

One reason to form a collection of any type is to provide simpler access, both physical and intellectual, to the material collected. In the case of digital collections of text, provision of a simple unified user interface faces several challenges, among them an unabating tension between intellectual adequacy and simplicity of access. Access is simpler if all texts are encoded consistently and can be searched using the same simple textual model. But serious work with texts often requires painstaking attention to the unusual, anomalous, and unique features of a text, which often cannot be captured at all using a simple monolithic markup scheme like HTML. Some texts may be worth marking up in much more detail than is possible for the entire collection. How can we allow cross-collection searching and display while preserving, not falsifying, the variation in our texts and the resultant variation in their markup? This paper discusses the current state of the art for these problems and points to outstanding questions which remain to be resolved.