Bibliomining for Automated Collection Development in a Digital Library Setting Using Data Mining to Discover Web-Based Scholarly Research Works
Scott Nicholson
Published online 7 July 2003

Nicholson suggests the use of data mining techniques to discover patterns in the world wide web's pages needed for automated collection development for academic digital libraries. Possible techniques include logistic regression, where the variable combinations that best predict classes are discovered and used to predict membership of new observations; memory-based reasoning, like N-neighbor non-parametric analysis, where a distance function between new and existing observations allows a choice among pre-classified neighbors; Decision/classification trees, where rules for dividing a large set are made on the basis of the best discriminating variable; and neural networks, where neurons accept 0-1 measurements for each variable and weigh and combine variables until the optimal weight combination for the training set is determined..

Forty two librarians ranked selection criteria from the literature and suggested additional criteria. Low ranked criteria were removed and new suggestions added with iterations until consensus was reached. These criteria were made operational in a Perl program that analyzed web pages. 4500 scholarly pages were identified for use as a training set, and 500 from other sites as a test set. An additional 4500 non-scholarly pages were identified for the training set and 500 for the test set. Values were collected by the program for each criteria creating surrogate records for the pages. Logistic regression correctly classified 463 scholarly pages and  473 random pages. N-neighbor non-parametric analysis correctly classified 438 scholarly pages and  475 random pages. The classification tree method correctly classified 478 scholarly pages and 480 random pages. Neural networks correctly classified 465 scholarly pages and 469 random pages. Accuracy (precision) varied between 93.75% and 96%, while return (recall) varied form 87.6% to 95.6%.While the classification tree method provided the highest values all models were effective.









Overlap in Bibliographic Databases
William W. Hood and Concepcion S. Wilson
Published online 16 June 2003

From over 100 DIALOG databases Hood and Wilson locate about 15,600 records for a period from 1965 to 1993 on Fuzzy Set Theory by searching "fuzzy" and extracting by hand a list of pertinent records. The data was then cleaned and standardized and a combination of two duplicate detection keys were used to locate overlapping records found in more than one database.  The frequency distribution shows no overlap occurs for 63.26% of the records, 12.29% were duplicated once, and .03% were duplicated 12 times, the highest rate.  The distribution would appear to fit the inverse power law but an exponential curve provides a better fit. Looking at the papers found in only one database, 42% of the 5815 found in SCISEARCH are unique and represent 15.7% of the total record set. Intra-database duplicates were found in 28 databases. MATHSCI, which retains originals when they are amended, had a 17.8% duplication rate in the fuzzy set literature. While the PASCAL double indexing accounted for its .5% duplication rate, the .4% rate in SCISEARCH resulted from new records with references being added when the original had been previously entered without references.       Overall intra-database duplication is quite low. Overlapping records correlate with overlapping DIALOG OneSearch categories.









The Experience of Libraries Across Time Thematic Analysis of Undergraduate Recollections of Library Experiences
Jacqueline Kracker and Howard R. Pollio
Published online 11 June 2003

Kracker and Pollio look at the patron's impressions of libraries by way of the qualitative research techniques of content analysis and phenomenological inquiry in which one identifies reoccurring themes in recorded dialogs concerning a topic and the ground upon which they occur. Thus the meaning of the concept for that individual may be identified in terms of their direct experience. One hundred and eighteen undergraduate students enrolled in a freshman psychology course volunteered as subjects. Each was asked to provide, along with basic demographic data, a short description of three specific incidents related to libraries, and a longer description of one of these incidents. The incidents were categorized into six school level categories and five type of library categories resulting in 708 coded events. With the self considered as the ground themes having to do with atmosphere, size and abundance, organization /rules and their effect, what I do in a library, and memories were identified. This allows one to formulate a typical library experience for a 19 year old college student, an experience that changes during different educational periods.









Intermediary's Information Seeking, Inquiring Minds, and Elicitation Styles
Mei-Mei Wu and Ying-Hsang Liu
Published online 18 July 2003

Wu and Lui are concerned with finding the linguistic styles used by intermediaries in their conduct of interactions with those with information needs, and with determining if certain mind sets can be associated with such styles. Thirty patrons' interactions with one of five different intermediaries were video and audio taped while an observer kept notes.  Participants responded to questionnaires on their perceptions of the process and general user satisfaction and users were interviewed on audio tape post search. Using seven categories of linguistic form, ten categories of elicitation purpose, and seven categories of communication function, the texts were analyzed and a chi- square test showed differences in each among intermediaries and identified three styles termed situational (differing with user needs), functional (no functional differences), and stereotypical (purposes, functions and forms are constant). The mind set of the intermediary determined by analysis of discourse led to three types; problem detection (focus on reexpressing and understanding the need), query formulation (focus on terminology), and database instruction (focus on proper selection and use of databases). No linkage between styles and mind sets was established.












Introduction and Overview Chemistry Journals The Transition From Paper to Electronic With Lessons for Other Disciplines
Loren D. Mendelsohn
Published online 18 July 2003

The articles in this Perspectives have been en selected from papers presented at the Tri-Society Symposium, held on June 9, 2002, in Los Angeles, California, this Symposium. They discuss a broad spectrum of issues that have been raised as an increasing number of libraries convert from paper to online journal subscriptions, ranging from broad questions addressing the process of the changeover to studies of more specific issues. Taken together, they provide a useful overview of the process and contribute significantly to the scholarship in this field. Moreover, these articles have broader applications. The questions raised by the transition from print to electronic are not related solely to chemical information or even science and technology information; since scholarly journals in all disciplines are making the transition from print to electronic, similar questions can be raised with regard to all disciplines.






New Knowledge Management Systems The Implications for Data Discovery, Collection Development, and the Changing Role of the Librarian
David Stern
Published online 18 July 2003

David Stern's introductory essay raises several questions concerned with the trend toward electronic journals. By highlighting such issues as complex differential pricing plans, the development of new and complex tools for data manipulation, and how these factors affect the role of the librarian, he provides a framework for reading and understanding many of the issues discussed in the subsequent articles.





Making the Transition From Print to Electronic Serial Collections A New Model for Academic Chemistry Libraries?
Tina E. Chrzastowski
Published online 18 July 2003

In examining the feasibility of moving from paper to electronic journals in a particular library, Tina E. Chrzastowski proposes and evaluates a new model for the academic chemistry library. In so doing, she establishes a list of basic factors and criteria that must be evaluated by any institution considering this transition.






Changing Use Patterns of Print Journals in the Digital Age Impacts of Electronic Equivalents on Print Chemistry Journal Use
K. T. L. Vaughan
Published online 18 July 2003

K.T.L. Vaughan examines the transition from a different perspective, focusing instead on how the use of paper copies of journals is affected by making available electronic copies of those same journals. By exploring this particular aspect of the question, she provides data that will help library administrators evaluate the utility of retaining paper copies in an increasingly electronic environment.






Linking of Errata Current Practices in Online Physical Sciences Journals
Emily L. Poworoznek
Published online 18 July 2003

One of the central questions raised by the change from paper to electronic has to do with the nature of the copy of record. Emily L. Poworoznek examines the treatment of errata in electronic journals by a large group of commercial and professional society publishers, pointing out the significance of this issue for the integrity of the scientific record. She further compares these new approaches with the traditional manner of handling errata in printed journals, and discusses indexing under both systems, recommending the necessity of standards that will function under the electronic serials rubric.





Managing Tradeoffs in the Electronic Age
A. Ben Wagner
Published online 18 July 2003

A. Ben Wagner's historical analysis provides an excellent wrap-up, reviewing the introduction and development of electronic resources over the past three decades and analyzing the gains and losses involved in the transition. His paper provides a framework for decision-making in this area.


The Accidental Systems Librarian, by Rachel Singer Gordon
Lisa A. Ennis
Published online 7 July 2003



Library Information Systems From Library Automation to Distributed Information Access Solutions, by Thomas R. Kochtanek and Joseph R. Matthews
Brenda Chawner
Published online 7 July 2003



Impact of Digital Technology on Library Collections and Resource Sharing, edited by Sul H. Lee
William J. Wheeler
Published online 7 July 2003



Persuasive Technology Using Computers to Change What We Think and Do, by B. J. Fogg
Anastasis D. Petrou, Ph.D.
Published online 7 July 2003


Special Topic Issue of JASIST Multilingual Information Systems
Published online 12 June 2003

