B  U  L  L  E  T  I  N

of the American Society for Information Science

Go to
 Bulletin Index

Volume 25, No. 3

Go to the ASIS Bookstore

February / March 1999




Letter to the Editor

To the Editor:

I read with great interest Bella Weinberg's report of the panel I had the privilege to moderate at the last ASIS meeting (Bulletin of the American Society for Information Science, "Improved Internet Access. . ." pp. 26-29, December/January 1999). She is an insightful and eloquent speaker and writer. Either I mispoke/miswrote or she has me saying something I don't "hold to": 

"Koehler suggested that the Web is unindexable by traditional methods because of the hugh number of sites on it."

Rather, I argue that the Web is difficult to index (perhaps closing in on impossible) not because of the large number of sites. (Bella clearly demonstrates that a few million is not so many when compared to print.) What I do argue is that a very large number of Web sites (almost 100%) and Web pages (almost 98%) will undergo some kind of change over the period of a year. Change may be profound (site/page demise to major information content change) to trivial. My question is: Can traditional cataloging cope with all that change? Well, maybe. But we need to explore that more.

Remember also that in the traditional world, once a publication is superceeded, the earlier document almost always continues to exist and often along side the new one. These may be identified as new editions or new imprints. This pattern is almost never followed on the Web. The new replaces and erases the old.

I also suggest that the Web offers attributes not found in print or not ordinarily cataloged that can be captured and used in the cataloging process. One obvious example is rate of change. Another is intermittence rates. Edu sites, for example, are more intermittent (come and go, disappear then reappear) than say, com sites. Com sites, on the other hand, change more frequently than the others. These data can be captured and reported for individual sites and pages. We can also capture object mixes, size in bytes or number of objects, link strutures and so on. All of these can be automated and reiterated as often as one chooses.

Yes, as Bella has it, nothing is new under the sun. And I appreciate the cocitation with King Solomon. But sometimes the analogy between the old and the new isn't quite as straightforward as we would like.

    Wallace Koehler
    Assistant Professor, School of Library and Information Studies
    University of Oklahoma
    405/325-3921; fax: 405/325-7648

Bella Weinberg responds:

I appreciate Prof. Koehler's kind words about my ASIS '98 presentation.  He was an excellent moderator, and one can learn much from his paper (which is forthcoming in Journal of Librarianship and Information Science, March 1999).

My statement on Koehler's belief in the unindexability [by humans] of the Web was based on the compuscript he sent me before the conference.  The abstract begins: "The World Wide Web offers many challenges to those seeking to . . . index its content.  Given the sheer size of the Internet . . ., to efficiently capture that content, it is necessary that automated or near-automated methodologies be developed." Similar statements follow statistics on the number of Web sites in the concluding section of the paper.  In his presentation Wallace made the statement that "The Web is far too large and complex [for human indexing]"; perhaps my paraphrase was too strongly worded.  There is an anachronistic quality to Wallace's letter: alluding to my statistics on the number of printed documents cataloged to demonstrate that the number of Web sites is manageable.

Koehler maintains that his major focus is not size of the Internet but change.  A substantial portion of my Bulletin paper addressed that facet of  the Web.  New points in Koehler's letter can be answered in the same vein as prior ones.  The lack of preservation of early versions of Web sites is not a new problem: literary historians bemoan the invention of the word processor because it has largely eliminated draft manuscripts.  I disagree that old editions of printed works often sit alongside new ones: Most libraries discard them.

While the content of Web sites changes often, I believe that the general category headings assigned to them by Internet directory services remain valid.  Few of these services do analytical indexing; spiders deal with the different words in the updated sites. Moreover, I happen to believe that cataloging data for printed documents needs frequent reevaluation ("A Theory of Relativity for Catalogers," Cataloging Heresy, 1992).

When one compares the characteristics of Web sites with those of older media, there are obviously differences of degree.  My key point is that catalogers and indexers have experience with changing texts and multimedia that is relevant to the challenge of organizing the Internet.  If the rate of change of Web sites requires a great many human beings to index them, that is great news for the information professions.

    Bella Hass Weinberg
    Professor, Division of Library and Information Science
    St. Johns College



How to Order

@ 1999, American Society for Information Science
Last Update: February 21, 1999