Digital Libraries Initiative – Phase 2: Fiscal Year 1999 Awards
by Stephen M. Griffin
The following pages contain performer and abstract information for awards made in Fiscal Year 1999 as part of the Digital Libraries Initiative – Phase 2 (DLI-2). Spring 1999 actions are
listed first, followed by earlier awards made in Fall 1998. DLI-2 consists of 3 major components: the Research, Testbeds and Applications component www.nsf.gov/cgi-bin/getpub?nsf9863
an evolving Undergraduate Emphasis component www.nsf.gov/cgi-in/getpub?nsf9863 plus updates at www.dli2.nsf.gov/under.html and the International Digital Libraries Collaborative Research component www.dli2.nsf.gov/intl.html There are no additional general calls for proposals planned at this time. Future competitions for special emphasis activities are anticipated as the initiative progresses. Review
panels scheduled for this summer and early fall may result in additional actions in this fiscal year. However, awards from proposals received for the May 17 deadline will be determined in fiscal year 2000, which begins October 1,
1999. More complete information on the program, funded projects and related activities in the broader digital libraries community (including earlier and non-US efforts) can be found at the DLI-2 Web site:
www.dli2.nsf.gov DLI- 2 is an interagency program sponsored by
The program operates in partnership with
Within the NSF, the initiative receives support from the Directorates for Computer and Information Science and Engineering; Social, Behavioral and Economic Sciences; and Education and Human Resources.
NSF serves as the administrative agent for the initiative's competitions and funded projects. Policy, planning and programmatic decisions are made by an interagency management group in which representatives of each sponsoring
agency and the NSF directorate participate.
DLI-2 consists of 3 major components: the Research, Testbeds and Applications component
an evolving Undergraduate Emphasis component
www.nsf.gov/cgi-in/getpub?nsf9863 plus updates at www.dli2.nsf.gov/under.html
and the International Digital Libraries Collaborative Research component
There are no additional general calls for proposals planned at this time. Future competitions for special emphasis activities are anticipated as the initiative progresses.
Review panels scheduled for this summer and early fall may result in additional actions in this fiscal year. However, awards from proposals received for the May 17 deadline will be determined in fiscal year 2000, which begins October 1, 1999.
More complete information on the program, funded projects and related activities in the broader digital libraries community (including earlier and non-US efforts) can be found at the DLI-2 Web site:
DLI- 2 is an interagency program sponsored by
The program operates in partnership with
Within the NSF, the initiative receives support from the Directorates for Computer and Information Science and Engineering; Social, Behavioral and Economic Sciences; and Education and Human Resources.
NSF serves as the administrative agent for the initiative's competitions and funded projects. Policy, planning and programmatic decisions are made by an interagency management group in which representatives of each sponsoring agency and the NSF directorate participate.
Spring 1999 Awards
A Patient Care Digital Library: Personalized Search and Summarization over Multimedia Information
§Kathy McKeown, Principal Investigator, Computer Science Department
§ Shih-Fu Chang, Co-Principal Investigator, Department of Electrical Engineering
§ James J. Cimino, George Hripcsak, Co-Principal Investigators, Department of Medical Informatics
§ Judith L. Klavans, Co-Principal Investigator, Center for Research on Information Access
Healthcare consumers and providers both need quick and easy access to a wide range of online resources. The goal of this project is to provide personalized access to a distributed patient care digital library through the development of a system, PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video And Language resources). PERSIVAL will tailor search, presentation and summarization of online medical literature and consumer health information to the end user, whether patient or healthcare provider. PERSIVAL will utilize the secure online patient records available at Columbia Presbyterian Medical Center (CPMC) as a sophisticated, pre-existing user model that can aid in predicting user's information needs and interests. Key features of the proposed work include personalized access to distributed, multimedia resources available both locally and over the Internet, fusion of repetitive information and identification of conflicting information from multiple relevant sources, and presentation of information in concise multimedia summaries that cross-link images, video and text. When the latest medical information is provided at the point of patient care, it can help practicing clinicians to avoid missed diagnoses and minimize impending complications. When expressed in understandable terms, it can empower patients to take charge of their healthcare.
Informedia-II: Integrated Video Information Extraction and Synthesis for Adaptive Presentation and Summarization from Distributed Libraries
§Howard D Wactlar, Principal Investigator, School of Computer Science
§ Takeo Kanade, Yihong Gong, Co-Principal Investigators, Robotics Institute
§ Christos Faloutsos, Alexander Hauptmann, Michael Christel, John Lafferty, Co-Principal Investigators, Computer Science Department
§ Yiming Yang, Co-Principal Investigator, Language Technology Institute & Computer Science Department
The Informedia-II Project continues the pursuit of search and discovery in the video medium. This phase will transform the paradigm for accessing digital video libraries through meaningful, changeable overviews of video document sets, multimodal queries and adaptive summarizations of very large amounts of video from heterogeneous distributed sources. Video information collages are the key technology in Informedia-II and will be built by advancing information visualization research to effectively deal with multiple video documents. A video information collage is a presentation of text, images, audio and video derived from multiple video sources in order to summarize, provide context and communicate aspects of the content for the originating set of sources. The collages to be investigated include chrono-collages emphasizing time, geo-collages emphasizing spatial relationships and auto-documentaries which preserve video's temporal nature. Users will be able to interact with the video collages to generate multimodal queries across time, space and sources. Together with external partners, the project will also create an accessible, lasting digital video archive of historical, political and scientific relevance. Vast collections of video and audio recordings have captured the events of the last century, yet these remain a largely untapped resource of historical and scientific value.
The Alexandria Digital Earth Prototype (ADEPT)
§Terrence Smith, Principal Investigator, Computer Science Department, Geography Department
§ Mike Goodchild, Co-Principal Investigator, Geography Department
§ Anurag Acharya, Divyakant Agrawal, Co-Principal Investigators, Computer Science Department
§ James Frew, Co-Principal Investigator
§ Donald Bren, School of Environmental Science and Management
§ Bangalore Manjunath, Co-Principal Investigator, Electrical and Computer Engineering Department
§ Richard Mayer, Co-Principal Investigator, Psychology Department
§ Christine Borgman, Co-Principal Investigator, Department of Information Studies, University of California at Los Angeles
§ Richard Lucier, Co-Principal Investigator, California Digital Library
§ Reagan Moore, Co-Principal Investigator, San Diego Supercomputer Center
§ Robert Nideffer, Co-Principal Investigator, Department of Sociology, University of California at Irvine
§ Amit Sheth, Co-Principal Investigator, Department of Computer Science, University of Georgia
This project is a component of a collaboration between the University of California at Berkeley, the University of California at Santa Barbara and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL), and on a testbed developed by the San Diego Supercomputer Center.
The Alexandria Digital Earth Prototype (ADEPT) Project will develop digital library environments and services that are based on the Digital Earth Metaphor. The services will support access to, and use of, heterogeneous digital information distributed across the Internet on the basis of georeference as well as other criteria. In particular, the system will support the construction and use of personalized digital information collections called Iscapes (Information Landscapes). A variety of services will be provided that allow Iscapes to be developed as information service layers in which diverse information resources can be organized, accessed and used. A characteristic feature of Iscapes is the creation of special meta-information resources indicating the joint usability of the items in the personalized collections. The project will focus on developing services that support the construction and use of Iscapes in learning contexts and for the creation of knowledge across a range of disciplines, including the arts, humanities, and social, physical and biological sciences. The project will focus specific attention on evaluating the effect of ADEPT services on learning in undergraduate classroom situations.
Stanford Digital Libraries Technologies
§Hector Garcia-Molina, Principal Investigator
§ Terry Winograd, Dan Boneh, Co-Principal Investigators, Department of Computer Science
This project is a component of a collaboration between University of California at Berkeley, the University of California at Santa Barbara and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL) and on a testbed developed by the San Diego Supercomputer Center.
The Stanford project will continue to develop base technologies to overcome critical barriers to effective digital libraries. These include heterogeneity of information and services; lack of powerful filtering mechanisms that let users find truly valuable information; insufficient availability of interfaces and tools that effectively operate on portable devices; and lack of a solid economic infrastructure that encourages providers to make information available and gives users privacy guarantees.
Re-inventing Scholarly Information Dissemination and Use
§Robert Wilensky, Principal Investigator
§ David Forsyth, Co-Principal Investigator, Computer Science Division, School of Information Management and Systems
This project is a component of a collaboration between the University of California at Berkeley, the University of California at Santa Barbara and Stanford University. The combined technologies will be demonstrated on the emerging California Digital Library (CDL) and on a testbed developed by the San Diego Supercomputer Center.
The project will attempt to develop tools and technologies that support highly improved models of information dissemination and access. A goal is to facilitate moving from the current centralized, discrete publishing model to a distributed, continuous, self-publishing model, while at the same time preserving and enhancing the best aspects of the current model. In the envisioned model, information can be disseminated prior to publishing; it can be disseminated and composed continually; it will also have a significant non-textual data component. The model is consistent with the changing economics of academic publishing, yet has the potential to drastically alter the cost structure of scholarly information dissemination.
To promote such an improved paradigm, it is planned to (i) develop a set of enabling technologies, (ii) develop related technologies that exploit the paradigm to support functionality not readily available in the traditional model, (iii) experimentally develop publishing models and digital collections in line with the new paradigm, (iv) conduct studies on economic models of alternative information paradigms and (v) conduct user studies to help evaluate the impact of the work.
An Operational Social Science Digital Data Library
§Sidney Verba and Gary King, Principal Investigators, Department of Government
§ Dale Flecker, Nancy M. Cline, Co-Principal Investigators, University Library
§ Micah Altman, Director and Co-Principal Investigator
This proposal is for developing a Virtual Data Center (VDC) for managing and sharing numerical social science data for teaching and research purposes across multiple institutions. This project will refine and extend the prototype data server developed by the Harvard-MIT Data Center and turn it into a free, portable software product that will integrate with other data centers and library databases by supporting a variety of communication and interoperation protocols.
The VDC will address some of the problems associated with electronic data including the length of time it can take to access online data-sets and the unavailability of the data that form the basis of many research publications. Data owners will be able to deposit data in many formats and set the terms of access to their data. Users will be able to search for and download data in many formats and will be able to request only the specific variables they need. The center will provide access to both public domain and proprietary data and will be a launch pad to statistical data stored all over the world.
Security and Reliability in Component-based Digital Libraries
§Carl Lagoze, Principal Investigator
§ Kenneth P. Birman, Fred B. Schneider, Co-Principal Investigators, Computer Science Department
§ Anne Kenney, Sarah Thomas, Co-Principal Investigators, Cornell University Library
Before the advent of digital information, attention to information integrity was the charge of a number of institutions – among them research libraries, publishers and legal authorities. A major challenge in the digital age, and essential to the creation of digital libraries, is the creation of new mechanisms to ensure information integrity and new methods to administer those mechanisms. Information integrity has three major characteristics: 1) reliability, which ensures that information is available where and when people want it; 2) security, which protects both the privacy rights of users of information and the intellectual property rights of content creators; and 3) preservation, which ensures the longevity of intellectual content for use by future generations.
Failure to create these will inevitably threaten the viability of all institutions – government, business, education and defense – that rely on digital technology for their mission-critical information resources. The Cornell Digital Library Project will investigate and develop working prototypes of a digital library architecture with particular attention to supporting these integrity issues. The architecture will build on the notion of reusable components, which focus on the critical realities and benefits of the networked environment, global distribution, federation of content and services distributed among multiple administrative entities and extension – where new components and capabilities can be added to the architecture to suit community-specific requirements or in response to new technologies.
Founding a National Gallery of the Spoken Word
§Mark Kornbluh, Principal Investigator, History Department
§ Jack Deller, Co-Principal Investigator, Department of Electrical and Computer Engineering
§ Joyce Grant, Co-Principal Investigator, Department of Teacher Education, College of Education
§ Michael Seadle, Co-Principal Investigator, Michigan State University Libraries
§ Douglas Greenberg, Co-Principal Investigator, Chicago Historical Society
§ John Hansen, Co-Principal Investigator, University of Colorado
§ Jerry Goldman, Co-Principal Investigator
From Thomas Edison's first cylinder recordings to the voices of Babe Ruth and Florence Nightingale and Studs Terkel's timeless interviews – the National Gallery of the Spoken Word (NGSW) will preserve and, within the limits of copyright law, make these and other historically significant voice recordings freely available and easily accessible via the Internet. The NGSW will create a significant, fully searchable, online database of spoken word collections that span the 20th century. A collaborative project among the humanities, engineering, education and library science, this gallery will provide the first large-scale repository of its kind.
By identifying and digitally preserving crucial materials in voice libraries throughout the United States, the NGSW will provide storage for these digital holdings and public exhibit "space" for the most evocative collections, not unlike physical museums. However, unlike a physical museum, the NGSW faces no space limitations and never needs to rotate items out of the exhibited collection. All exhibits in the NGSW will remain on display permanently, freely available to all visitors.
This endeavor provides an important opportunity for research and education to suit a range of fields and interests. While much work has been done to develop better methods for preserving text and graphical images, many critical technical problems remain unsolved when it comes to digitally preserving sound and delivering it via the WWW. Analog versions of speech resources suffer from machine noise, copying distortion, background sound and deterioration. And while there are a number of search techniques that work well for written text, such tools do not yet exist for large-scale collections of spoken materials. The NGSW will address all these concerns. Participants in this project include researchers who are recognized leaders in the development of aural search capabilities. The NGSW will also create a repository of high quality digital versions of key spoken material with standard bibliographic and metadata access, while developing a set of best practices for future development of sound on the Web, including methods for conversion, preservation, access and copyright compliance.
A Digital Library for the Humanities
§Gregory Cane, Principal Investigator, Department of Classics
§ Robert Jacob, Co-Principal Investigator, Electrical Engineering and Computer Science Department
§ Holly Taylor, Co-Principal Investigator, Psychology Department
§ Ross Scaife, Co-Principal Investigator, Kentucky Classics, University of Kentucky
§ Nancy Allen, Co-Principal Investigator, Museum of Fine Arts, Boston
This project is focusing on developing the foundations of a scalable, broad-based, interdisciplinary digital library for the humanities. The principal investigators for this project include not only humanists but also specialists in computer-human interface design and in cognitive science. The goals will be both to improve the ways that humanists can perform their intellectual work and to design materials that are more accessible to the vastly expanded audience already reached by the World Wide Web. The Perseus Digital Library for the Humanities brings together specialists in the humanities, computer science and cognitive science to research methods and structures for building interdisciplinary humanities documents into components of scalable, integrated digital libraries. The project team will study the effect of new electronic publications on a wide range of audiences, ranging from the general public to scholars conducting research. The Perseus Project (www.perseus.tufts.edu) is an extensive digital library on Greco-Roman culture and will serve as a substantial laboratory for human-centered and technical research. Partners include the Max Planck Institute in Berlin, the Modern Language Association, the Museum of Fine Arts, Boston, and the Stoa electronic publishing consortium. Special collections at three libraries (Brandeis University, the University of Pennsylvania and Tufts University) will offer new content and allow development of new testbeds in areas that include ancient Egypt, the texts of Shakespeare and 19th century London.
A Software and Data Library for Experiments, Simulations and Archiving
§David Willer, Principal Investigator, Department of Sociology
§ E. Elisabet Rutstrom, Co-Principal Investigator, Department of Economics
This proposal is to build, maintain and evaluate a software and data library for experiments, simulations and archiving primarily for the social and economic sciences. It will serve as a "Web-Lab Library" and multi-functional knowledge center. There will be a library of software for experiments at the Web site to support theoretically driven experimentation and a library of simulation programs for research and education. Data from current experiments will be recorded and automatically archived. The archiving format will be extensible to support inclusion of data from prior experiments. Innovative data retrieval and display systems will be developed.
The Web-Lab Library will be developed by a Hub at the University of South Carolina and two associated Collaboratories at the University of Iowa and Georgia State University. The Hub supports programmers with substantial knowledge and experience of social science research. The social scientists at the Hub and Collaboratories will develop designs for the Web-Lab Library. All will conduct experiments-at-a-distance to test software as it is developed.
Digital Workflow Management: Lester S. Levy Collection of Sheet Music
§Sayeed Choudhury, Principal Investigator
§ Cynthia Requardt, Co-Principal Investigator, Digital Knowledge Center
This project will seek to enhance the use and usability of the Eisenhower Library's Lester S. Levy Collection of Sheet Music and similar collections located elsewhere. The Eisenhower Library previously digitized this collection of more than 29,000 pieces of American popular sheet music spanning the years 1780 to 1960. The sheet music in this collection provides a social commentary on American life and a distinctive record of their time.
The project will create sound renditions and enhanced search capabilities for the collection. Audio files and full-text lyrics are being created using optical music recognition software written by staff from the Peabody Conservatory at Hopkins. Workflow managing tools will be developed to reduce and focus human labor. The activities will result in a tested process, framework and set of tools transferable for use with other large-scale digitization projects.
A Multi-tiered Extensible Digital Archive of Folk Literature
§Samuel Armistead, Principal Investigator, Department of Spanish
§ Bruce Rosenstock, Co-Principal Investigator, Classics, Religious Studies
The Armistead-Silverman collection at the University of California at Davis contains 1500 "Judeo-Spanish" narrative ballads, together with other genres, including lyric poetry, folk tales, proverbs and riddles. The oral traditions preserved in the language also known as "Ladino" but called "Judeo-Spanish" in this grant proposal, were gathered by Professors Armistead, Katz and Silverman during the years 1957-1980 from informants from Bosnia, Macedonia, Bulgaria, Greece, Turkey, Morocco, Israel, Spain and the United States. This material is the largest collection of Judeo-Spanish oral literature in North America and one of the three largest in the world. The Judeo-Spanish oral tradition preserves a cultural legacy for the study of Sephardic Jewry as well as for researchers in the history of pan-Hispanic and pan-European balladry. This oral tradition, with roots extending back into Middle Ages, provides a unique matrix within which Hispanic written literature was created.
The technical goals of the project are to continue conversion of these materials to a multimedia digital corpus so that they can be made more widely available, with increased access and analytic capabilities. Textual transcriptions will be tagged using a number of markup methods, especially XML, and a digital audio database will be created. A variety of approaches will be tested to make the archive fully extensible. The project will build on earlier research products from other digital libraries projects, including the University of California, Berkeley digital libraries group.
The Digital Atheneum: New Techniques for Restoring, Searching and Editing Humanities Collections
§William Brent Seales, Principal Investigator
§ James N. Griffioen, Co-Principal Investigator, Department of Computer Science
§ Kevin S. Kiernan, Co-Principal Investigator, Department of English
This work will develop new digital libraries from aging and damaged portions of the Cottonian Collection at the British Library, tailored to the requirements of scholars in the humanities. The result of this project will be state-of-the-art technical approaches, tools that incorporate those new approaches and a widely distributed digital library of restored, previously inaccessible manuscripts. In particular, the technical focus will encompass the following important research areas:
The project has strong support from IBM through the Shared University Research (SUR). Likewise, partnership with the British Library provides privileged access to high-quality collections, manuscript and curator expertise and digitization facilities.
§Peter Buneman, Principal Investigator
§ Val Tannen, Susan B. Davidson, Chris Overton, Co-Principal Investigators, Department of Computer and Information Science
§ Mark Liberman, Co-Principal Investigator, Department of Linguistics
This project will address issues associated with data provenance. Provenance is concerned with how information has arrived at the form in which appears – who produced it, who has corrected it, how old it is, how it was originally produced and so forth. Understanding provenance has occupied scientists, historians, textual critics and other scholars for centuries. The provenance of data in databases is a newer and larger problem, because one is interested in data at all levels of granularity – from a single pixel in a digital image to a whole database. Just as scholars comment on documents by attaching annotations (marginalia) to text, part of the solution to recording provenance is the attachment of annotations to components of databases.
Database researchers have recently considered loosely structured forms of data and have developed software systems for querying and storing such data. This work is closely related to new formats that have been developed for structured documents on the Web. It is expected that this technology will provide the substrate for recording and tracking provenance by advancing new data models, new query languages and new storage techniques.
DL of Vertebrate Morphology Using a New High Resolution X-ray CT Scanning Facility
§Timothy Rowe, Principal Investigator, Department of Geological Sciences
This project is an intensive application of high-resolution X-ray Computed Tomography scanning (X-ray CT) to the study of the vertebrate skeleton. These instruments are descendants of medical diagnostic CT scanners, and they enable the non-destructive inspection of tiny 3-dimensional objects in unprecedented detail. We will build a digital library of high-resolution X-ray CT images and 3-D models. The library will enable far more detailed and comprehensive analyses of vertebrate structure than was ever before possible by a global networked audience of researchers, educators and students. We will examine the skeleton in all of its forms, from fossils to embryos and adults of living species. We will survey a broad taxonomic diversity that includes important laboratory and research species and that samples the smallest four orders of vertebrate size-magnitudes.
We envision an interactive digital library that will accelerate education as it fosters fundamental new research discoveries in vertebrate structure, function, embryology, bioengineering and evolution. The library core will be distributed over the Web. We will also expand our partnership with distinguished academic publishers of books and journals to distribute selected high-resolution datasets on CD-ROM via established peer-reviewed mechanisms that reach large professional societies and educational audiences. We believe that our prototype library design will be readily exportable across the community of engineers, physicians and natural historians already using CT and other types of 3-D tomographic data.
This project is a collaboration among 24 researchers at leading research universities and natural history museums around the world. We believe that the digital library may eventually transform the study of vertebrate morphology. We expect it to foster fundamental new discoveries, accelerated communication and education, the formation of collaborations among widely distributed individuals and new digital alliances among engineers, scientists and publishers.
Using the Informedia Digital Video Library to Author Multimedia Material
§Brad Myers, Principal Investigator, School of Computer Science
This project will create a comprehensive Intelligent Video Editor that will allow people without special training to author interesting compositions using digital video. In particular, the editor will support sophisticated interactive behaviors for the videos and for extra graphical drawings (called synthetic graphics) layered on top of the videos. For example, users might specify which objects in the video can be clicked on to choose the next video clip or that an arrow should be drawn that shows the path that an object will follow or that the video is part of a lesson and a viewer's answer to a question determines the next action. There will also be high-level facilities for searching and organizing videos, video editing, demonstrating behaviors, writing scripts in a more natural programming language, and testing and debugging the code. Children and their teachers will be able to create interesting interactive compositions using videos. The tools we create will be continuously tested with school children and adults to evaluate and refine the various features. The goal is to make it as easy to use the video material found in a digital library as it is to use textual material found in today's libraries.
High-Performance Digital Library Classification Systems: From Information Retrieval to Knowledge Management
§Hsinchun Chen, Principal Investigator
§ Robin Sewell, Co-Principal Investigator, Artificial Intelligence Lab, Department of Management of Information Systems
The proposed research aims to develop an architecture and the associated techniques needed to automatically generate classification systems from large domain-specific textual collections and to unify them with manually created classification systems to assist in effective digital library retrieval and analysis. Both algorithmic developments and user evaluation in several sample domains will be conducted in this project. Scalable automatic clustering methods including Ward's clustering, multi-dimensional scaling, latent semantic indexing and self-organizing map will be developed and compared. Most of these algorithms, which are computationally intensive, will be optimized based on the scarcity of common keywords in textual document representations. Using parallel, high-performance platforms as a time machine for simulation, we plan to parallelize and benchmark the above clustering algorithms for large-scale collections (on the order of millions of documents) in several domains. Results of these automatic classification systems will be represented using several novel hierarchical display methods.
The testbed of research will include three application domains that consist of both large-scale collections and existing classification systems:
A Distributed Information Filtering System for Digital Libraries
§Mathew J. Palakal, Principal Investigator
§ Rajeev R. Raje and Snehasis Mukhopadhyay, Co-Principal Investigators, Department of Computer and Information Science
§ Javed Mostafa, Co-Principal Investigator, School of Library and Information Science
The popularity and the growth of the Internet and associated networking technologies are allowing a rapidly increasing number of users, representing diverse segments of the society, to access an enormous amount of geographically dispersed information available in different electronic form and media. With the successful completion of prominent efforts, such as the Digital Library Initiative, this volume of information will grow at a phenomenal rate. Without effective automated support systems to access and filter such information, an average user runs the risk of being overwhelmed by the sheer volume of irrelevant and possibly unwanted information. Unlike traditional information systems, digital libraries are inherently dynamic and distributed in nature. Providing a personalized, efficient, adaptive and intelligent access to this plethora of information, without creating an "information overload" on the users, is a major challenge right now and will become increasingly urgent as we head into the next millennium.
The proposed research is aimed at designing and developing a distributed intelligent information distribution and filtering system that provides personalized information services to the user while minimizing direct user involvement. The system will weed out unwanted (irrelevant) incoming information and traverse the network to retrieve relevant information of interest to the user. The filtering system will be realized using a collaborative framework of a multitude of information agents and will involve integration of advanced concepts and techniques from the domains of artificial intelligence, information retrieval, and distributed object computing.
Fall 1998 Award Actions
Automatic Reference Librarians for the World Wide Web
§Oren Etzioni, Principal Investigator
§ Dan Weld, Co-Principal Investigator, Department of Computer Science
By all accounts, the Web is humanity's largest and fastest growing repository of digital information. Many collections of information are Internet-accessible, and most will provide a searchable Web interface. While some collections have a broad array of materials, trends show an explosion in the number of specialized collections with narrow but very deep content. Thus a principal challenge facing users will be the selection of Web information sources capable of answering their queries. In a physical library, users rely on a reference librarian to help point them at the correct resource, but while human librarians are becoming increasingly sophisticated in their use of the Web, they are only part of the solution. We need more powerful automatic reference tools to help people efficiently retrieve high quality information from the Web.
Typically, reference librarians are not specialists in the topic of inquiry (e.g., computational fluid dynamics), but they are expert at identifying relevant resources (e.g., The International Journal of Fluid Dynamics) and at appropriate strategies for obtaining the necessary information. The central objective of this proposal is to create software agents that possess reference intelligence – a limited understanding of complex technical topics, but a very sophisticated understanding of how and where to find high-quality information on the World Wide Web.
Tracking Footprints through a Medical Information Space: Computer Scientist-Physician Collaborative Study of Document Selection by Expert Problem Solvers
§Paul Gorman, Principal Investigator, Biomedical Information Communication Center, Oregon Health Sciences University
§ David Maier, Lois Delcambre, Co-Principal Investigators, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology
The goal of this project is to help expert problem solvers find needed information in a large, complex information space. The focus is on one example of expert problem solving: the health care field. Sorting through such a heterogeneous collection of electronic and other media materials to find needed information, sometimes under time duress, can be formidable. This project proposes to capture the trace of information used by experts – to monitor the paths taken and collection resources used by, in this case physicians, in moving from observation to information gathering to solution of a given health care problem. By capturing the artifactual trace information associated with information seeking and selection, it is hypothesized that greater insight can be gained into behaviors of users and patterns of usage. This knowledge can then be fed back into the design and development of new information environments. The work will be conducted by a cross-disciplinary team comprising a medical doctor focusing on information seeking behaviors of physicians and a group of computer scientists focusing on extracting and using regularly structured information. The usefulness of the approaches will be tested in domains other than health care, in particular the aircraft design industry through the active support of the Boeing Corporation.
Image Filtering for Secure Distribution of Medical Information
§Gio Wiederhold, Principal Investigator, Department of Computer Science
An increasing amount of information being transmitted over the Internet is in image form. This trend includes medical images used in diagnosis and research and other materials for which it is desirable to avoid violations of security and privacy. While privacy and security control of textual materials has long been a focus of research activities, images present new and more challenging problems. Filtering of images in addition to text becomes more essential as modern computing and communications facilitate the use of information in image form.
This project proposes to provide image filtering capabilities to complement other means of checking the contents of documents. The domain of interest is electronic medical records, but the research products are expected to be generalizable to other domains of interest. The effort will focus on developing further wavelet-based algorithms for searching medical image databases and retrieving relevant information from multimedia medical databases; extracting textual information from images; advancing practices for the protection of privacy and implementing a security mediator; and exploring WWW interfaces for security mediators.
Fall 1998 Undergraduate Emphasis Awards
Using the National Engineering Education Delivery System as the Foundation for Building
a Testbed Digital Library for Science, Mathematics, Engineering and Technology Education
§Alice Agogino, Principal Investigator, College of Engineering
Two key National Science Foundation reports, "Systemic Engineering Education Reform: An Action Agenda" and "Shaping the Future: New Expectations for Undergraduate Education in Science, Mathematics, Engineering and Technology," urge the formation of a national resource to provide access to quality courseware and to disseminate successful educational practices. Since the early 1990s, NEEDS – the National Engineering Education Delivery System – has provided these services for the engineering education community. Building on this base, this project will
Planning Grant for the Use of Digital Libraries in Undergraduate Learning in Science
§Kurt Maly, Principal Investigator
§ Mohammed Zubair, Stewart Shen, Steven Zeil, Co-Principal Investigators, Department of Computer Science
Instructional methods in academe are shifting from a teacher-centered paradigm to a user-centered paradigm. Advances in networking, digital libraries and digital media technology are making the World Wide Web an effective framework for supporting this type of active learning. This project will develop a set of prototype tools and processes and an environment to provide preliminary answers to a set of questions that underlie the design and implementation of a digital library for science, mathematics and engineering education. In particular, we will develop, run, collect data from and analyze one student-centered computer science course. This project builds on experience with the Networked Computer Science Technical Report Library (NCSTRL) and work at Old Dominion University to develop NCSTRL+.
Virtual Skeletons in Three Dimensions: The Digital Library as a Platform for Studying Web-Anatomical Form and Function
§John Kappelman, Department of Anthropology
Recent developments in three-dimensional digitizing hardware and software make it possible, practical and economical to scan and archive complex-shaped objects, including a range of skeletal elements from a variety of large and small-sized species, into a digital library for study and research. Making anatomical materials, including elements from species commonly used in education and rare or even endangered species, widely available has far-reaching implications for research and for education from grade school through graduate school.
This project will begin the creation of such a library, starting with chimpanzees and baboons and using both low and high resolution technologies. It will also design and implement a "discovery interface" that will provide an interactive framework for investigation that will benefit both beginning and advanced users. The project builds on work at the University of Texas, Austin, including the course Introduction to Physical Anthropology and Human Evolution and the CD-ROM Virtual Laboratories for Physical Anthropology.
Stephen M. Griffin is with the National Science Foundation. He can be reached by e-mail at email@example.com
This summary of DLI projects was first published in the July/August 1999 (Volume 5, Number 7/8; ISSN 1082-9873) issue of D-Lib Magazine
. It is reprinted with permission.