Research Data Access and Preservation Summit
An ASIS&T Summit
Phoenix, AZ  |  Hyatt Regency  |  April 9-10, 2010

In cooperation with the Coalition for Networked Information

|          Program         |         Archived Twitter Comments         |

Desired outcomes are:

  • Identification of exemplar data management systems
  • Identification of opportunities for federation across institutional repositories
  • Identification of interoperability mechanisms at technology, social, and legal levels
  • Promotion of collaborations between the identified parties
  • Roadmap for the research agenda needed to federate systems.

 
Research Data Management for Access and Preservation 

Researchers in all fields generate and analyze enormous quantities of digital data. In fields ranging throughout the sciences and humanities, managing, preserving, and sharing these data require substantial capital and human resources and new kinds of information professionals who are able to integrate technology, content, and policy skills. This summit aims to bring together leaders in data centers, laboratories, and libraries in different organizational and disciplinary settings to share ideas and techniques for managing, preserving, and sharing large-scale research data repositories with an eye toward achieving infrastructure-independent access and stewardship. The summit will engage three kinds of leaders: those from projects with experience in integrating high-performance technologies; those from large scale collaboratories in science, social science, and the humanities; and those from institutions coping with the challenges of integrating different technologies and data collections. The summit will address three main questions:

  1. What data access and preservation capabilities are required within and across research groups?
     
  2. What technical solutions exist to meet these needs and how do they scale across domains?
     
  3. What are the social contexts under which research communities assemble to share data?

The summit will take place in Phoenix April 7-9, 2010 at the Hyatt Regency Hotel. In addition to two keynote presentations, there will be a panel on each of these questions led by leaders who have practical experience dealing with the associated challenges. Participants will have ample opportunities to interact with speakers and each other and present techniques and concepts via posters and semi-structured discussions. Demonstrations of integrated systems that address data management challenges will be held, with opportunities to compare approaches and ask implementation details. Examples of large data management environments and integrated systems include SHAMAN/iRODS, Duraspace, Ocean Observatories Initiative, iPlant Collaborative, National Climatic Data Center, Large Synoptic Survey Telescope, University of Michigan digital library, MotifNetwork, Texas Digital Library
 

Focus of Summit

Provide opportunity for institutions to identify mechanisms and technologies that will enable large-scale collaboratories. We realize multiple solutions are viable, and seek to promote interoperability between the solutions. The summit will provide two outcomes: 1) serve as information resource for communities building collaboratories, 2) facilitate development of interoperability mechanisms between existing systems. Both outcomes are examples of consensus building on the best way to move forward, both within a social context and a technology context.

Panels for the three main topics are:

  1. What data access and preservation capabilities are required within and across research groups?

    - Panel on promoting reuse and repurposing of data. How is re-purposing being handled today? What approaches are working well? What level of description is appropriate for mapping from collection attributes to user-desired information? Is there a way to converge the multiple description standards through adoption of common description models? Is there a common identifier architecture?

    - Panel on data life cycle management. Can a single data management system support all phases of the data life cycle through mechanisms that support data repurposing? Can data repurposing be characterized as changes to management policies and procedures?
     
  2. What technical solutions exist to meet these needs and how do they scale across domains?

    - Panel on large scale data management problems in corporate and government settings, including challenges of Inter-enterprise data sharing. Are there common concerns for both corporate and government settings? 

    - Panel on DataNet integration efforts to demonstrate challenges involved in implementing viable data management infrastructure, including discussion of current and established work practices. What interoperability mechanisms are available for federating registries, ontologies, and business models? (We can invite participants from groups that submitted proposals, both successful and unsuccessful.) We can ask for 1-pagers on their approach, and the research issues that need to be addressed.
     
  3. What are the social issues that drive the formation of shared collections? 

    - Panel on assessment criteria. Can a social consensus drive the development of the governance and management policies that the shared collection should maintain? Who is responsible for forming the consensus (discipline-oriented solutions – Ocean Observatories Initiative; or university-oriented solutions – institutional repositories)? Can the social consensus be expressed through assessment criteria that validate desired properties of the collection? 
    - Possible Participants: ISO MOIMS-rac standards group, NARA,

    - Panel or session on the legal issues related to national and/or international laws and the implications for writing data management policies. Can management policies be defined that enforce regulatory requirements (Sarbanes-Oxley, HIPAA, IRB) and international laws?

Relevant sources:

The recent Nature section on data sharing highlights themes on data sharing that overlap with those of the summit, http://www.nature.com/news/specials/datasharing/index.html

The NSF “sustainable digital preservation and access symposium” will be a source of information on economic models after a final meeting scheduled for April 1, 2010 in New York.  


This summit is chaired by Dr. Reagan Moore. The Advisory Committee includes: 
William Anderson, Christine Borgman, Hsinchun Chen, Sayeed Choudhury, Michael Lesk, Gary Marchionini, William Michener, Art Pasquinelli, Sudha Ram, Stu Weibel