The second Research Data Access and Preservation (RDAP) Summit, held in March 2011 in Denver, brought together a mix of attendees from the science, library and archival communities. The interaction revealed the need to join forces to better manage data generated by research projects. Scientists presented lessons they learned from creating data management strategies from scratch, using technologies and dealing with administrative politics, while librarians related methods and issues faced in managing digital data repositories, curating and preserving critical research data. Conversations were surprising and enlightening, as scientists were largely unaware of librarians’ role in data management, and the groups discovered differences in their terminology and connotations. Attendees appreciated the opportunity for interaction and agreed on the need to collaborate on identifying others working on data access and preservation, finding existing solutions to repurpose and combining efforts with other organizations.
digital object preservation
information resources management
scientific and technical information
Bulletin, June/July 2011
Research Data Access and Preservation 2 Summit
RDAP2: First Impressions
by Joseph A. Hourclé
For many years, I've followed both groups involved in scientific data access and preservation: the library and information science community as a member of ASIS&T and the hard science community as a member of the American Geophysical Union and an associate member of the American Astronomical Society. I pursued my master's degree in information management when I realized that problems in building federated science data systems are not technical issues but stem, rather, from a lack of agreement on metadata – both in what metadata is necessary for finding and differentiating data and in what terms to use to describe the data and their definitions.
I quickly realized that many issues in the hard sciences were fields of study in the information sciences. Early on, I tried to apply some library research such as FRBR (Functional Requirements for Bibliographic Records) directly to my work on the Virtual Solar Observatory, but realized that politics are a more significant hurdle than technology. Unfortunately, most of the partnering in the hard sciences has been with computer science and other information technology fields. Much of my experience and current work are still related to that field, but technology alone cannot solve the issues of establishing communities, reaching consensus on metadata standards and resolving conflicts in different information cultures and information politics.
When I've proposed working with librarians, the scientists have pushed back, due to perceived past incidents. Most anecdotes are similar – the librarians that scientists dealt with more than a decade ago didn't understand them; they made comments such as “Just give us your data, and we'll catalog it for you.” The library and archives community have a wealth of experience in curation and preservation, but we need a combined effort with the specific scientific disciplines; librarians will never understand the data as well as scientists who create it. Likewise, scientists who collect the data are rarely the best people to document and describe their data for broader audiences, as they have intrinsic biases and assumptions well-known in their field, but potentially unknown to cross-discipline scientists. For example, the jargon may have a much narrower definition in their field than how others commonly use the same terms.
RDAP2 tackled these issues. Although last year's RDAP Summit was strong with attendees from both the sciences and libraries, the presentations were weighted toward big data, which received a large share of the attention. Data from small science, which took a greater share of the stage this year, is more difficult to describe and at greater risk of being lost. Small science is also the kind of research where libraries and institutional repositories can make a great difference in the preservation its records.
The Summit featured a diverse mix of presentations this year. People working on institutional repositories talked about the problems they encountered, the solutions they've developed and the groups that have formed to try to resolve these issues. Scientists talked about lessons learned from decades of experience building and maintaining data management systems and issues they encountered to serve as a warning to those just now getting their toes wet. Attendees heard talks on the technology used to implement these systems and even talks on issues of information policies and administration from an organizational level. We had presentations about general cases that were widely applicable and specific case studies with finer details.
Feedback from the session I moderated on data publication repositories included scientists commenting on how they were amazed that this whole other community they had never interacted with exists and deals with these kinds of problems. We may have had a little friction due to use of terms that had different connotations across groups, but this kind of bridging of communities focused on science data and identification of the problematic terms can help the library side better interact with the scientists trying to deposit data.
Let’s continue this cross-discipline conversation, not just through a yearly meeting, but through other channels as well. We need a place for discussing the issues related to data management, where we can
- identify other people working in the field,
- discover existing solutions that we might be able to repurpose and enrich for the community – rather than wasting time and effort on building from the ground up
- find organizations that we can collaborate with to get a better economy of scale on our efforts, learn from other group's work or mistakes, or just remind us that we're not in this alone.
If you are interested, join the ASIS&T RDAP listserv to help start this effort towards year-round conversation and collaboration: http://mail.asis.org/mailman/listinfo/rdap.
I look forward to RDAP3 scheduled for late March 2012 in New Orleans. I hope to see an even larger cross-section of the science data management community – both other science disciplines and other library groups such as ACRL (Association of College and Research Libraries), as well as related groups such as the journal publication community – to help make this effort successful.
Joseph Hourclé is principal software engineer, Wyle Information Systems. He can be reached at oneiros<at>grace.nascom.nasa.gov
Articles in this Issue
RDAP2: First Impressions