With research in demand for innumerable applications to society’s challenges, the Research Data Alliance (RDA) was formed to create and implement a unified data infrastructure. Established in March 2013, the alliance is directed toward establishing a framework to serve social and technical research needs at the local and global levels. The RDA brings together a community of stakeholders to guide the organization as it evolves and align participants’ diverse perspectives, striving for consensus, harmonization and an open community. Its structure features organizational and technical advisory boards, interest groups, shorter term working groups and an elected council to steer the alliance. With 700 members representing the academic, public and private sectors across 44 countries, the RDA seeks support from researchers, data managers, funding agencies and governments to promote the strategic and effective use of research data.

information infrastructure
research data sets
collaboration
information associations

Bulletin, August/September 2013


The Research Data Alliance: Implementing the Technology, Practice and Connections of a Data Infrastructure

by Mark A. Parsons and Francine Berman

All of society’s grand challenges – be it addressing rapid climate change, curing cancer and other diseases, providing food and water for more than seven billion people, understanding the origins of the universe or the mind – all of them require diverse and sometimes very large data to be shared and integrated across cultures, scales and technologies. This task requires a new form and new conception of infrastructure. The Research Data Alliance (RDA) is creating and implementing this new data infrastructure. It is building the connections that make data work across social and technical barriers. 

RDA launched in March 2013 as an international alliance of researchers, data scientists and organizations to build these connections and the infrastructure to accelerate data-driven innovation. RDA facilitates research data sharing, use, reuse, discoverability and standards harmonization through the development and adoption of technologies, policy, practice, standards and other deliverables. Our vision is researchers around the world sharing and using research data without barriers.

The Challenge of Creating a Data Infrastructure
Infrastructure is hard to define. We don’t usually recognize it until it’s gone. When it’s there and functioning, we tend to take it for granted – we flip the switch and the lights come on. It is only when it stops working that the full complexity of the infrastructure is revealed. As such, it can be very difficult, if not impossible, to define new emerging infrastructure.

We may be able to plan out a vision of the physical infrastructure, but things never play out according to plan. Inevitable tensions and complex dynamics emerge. For example, it was one thing to design an orderly interstate highway system with even-numbered roads running east-west, odd-numbered roads running north-south and three-digit-numbered roads as connecters and bypasses. It was quite another thing to deal with how interstates altered U.S. culture by promoting the growth of suburbs and fast-food chains while undermining other, often more efficient, transportation schemes.

Star and Ruhleder take a holistic and realistic view of infrastructure [1]. They argue convincingly that infrastructure is better viewed as a body of complex social, technical and sociotechnical relationships. They describe eight attributes that characterize infrastructure and ask not what is infrastructure but rather when is infrastructure. “Infrastructure is something that emerges for people in practice, connected to activities and structures” (p.112). Edwards et al take this view a step further in their report, Understanding Infrastructure: Dynamics, Tensions, and Design [2]. They show how we can arrange these eight attributes on technical/social and global/local axes (Figure 1.). The point is that we should not view the creation of infrastructure as a technical (or social) problem, but rather that we should decide whether we need social or technical, global or local solutions (or both) to infrastructural problems. It is in this dynamic space, dancing between the social and technical and between the global and local, that RDA works and tries to make an impact.

Figure 1

Figure 1. Illustration of the attributes of infrastructure as distributions along technical/social and global/local axes, showing the dynamic, iterative space where RDA operates. Based on a figure from F. Millerand in Edwards et al [2] derived from Star and Ruhleder [1].

Edwards et al and the general field of infrastructure studies (for example, Bowker et al [3]) show that there are multiple things happening at once when we are designing a data infrastructure. We simultaneously need to enable distributed collaborative work, engineer changes in the organization of research, such as rewarding data sharing, and enable interdisciplinary collaboration and data sharing. This work is largely social and organizational, but system designers often do not recognize these dimensions of their work practice. A central conclusion of Edwards et al is that the necessary data infrastructure will not be built from the center with a single design. Instead, it will be built from the ground up and in modular units. 

Edwards et al further argue that infrastructures become “ubiquitous, accessible, reliable and transparent” as they mature (p. i). They go through a staged evolution characterized initially by deliberate design of targeted technical services. Then technology transfer and adaptations lead to variations on the original design and the development of competing systems. Finally, a consolidation process occurs, in which systems link into networks and networks link into internetworks. This last phase is the crucial, make-or-break phase of infrastructure development. Arguably, the data infrastructure is beginning to enter this phase. It is also at this level where infrastructural tensions and conflicts are often most pronounced. Researchers can have both very proprietary and highly variable attitudes toward data. How and why research data are collected can be very personal, while the dataset itself can also serve as a public good. Data may be physical samples or output from a model. They may be numeric or textual, quantitative or qualitative, an intermediate outcome or a final product. Aligning and realigning these many perspectives is a major challenge and a primary function of RDA.

The RDA Solution
This theoretical examination of how infrastructure is created is illuminating, but it only takes us so far. How does it manifest itself into actual implementation? That is the challenge of RDA. 

We take a community-based approach that seeks to create the modular units of infrastructure that interconnect over time, and we do it in an environment guided by common principles that encourage harmonization. We also recognize that the final consolidation phase of infrastructure creation is typically characterized by gateways, brokers or intermediaries that allow dissimilar systems to interconnect. These gateways are not only technologies, but are often a technical solution combined with one or more social choices like the adoption or adaptation of a standard [2]. We, therefore, focus on adoption and implementation of the tools, code, best practices, standards and so forth that are created.

RDA Organization
The overall RDA organizational structure is outlined in Figure 2. The goal is to create the sort of adaptive, responsive environment that helps us avoid the negative path dependence that new infrastructures often encounter and address the critical unresolved problems that impede overall progress.

Figure 2

Figure 2. Currently proposed RDA organizational structure.

RDA membership is open to anyone who subscribes to the RDA Guiding Principles:

  • Openness – Membership is open. RDA community meetings and processes are open, and the deliverables of RDA working groups will be publicly disseminated.
  • Consensus – RDA advances by achieving consensus among its membership. RDA processes and procedures include appropriate mechanisms to resolve conflicts.
  • Balance – RDA seeks to promote balanced representation of its membership and stakeholder communities.
  • Harmonization – RDA works to achieve harmonization across data standards, policies, technologies, infrastructure and communities.
  • Community-driven – RDA is a public, community-driven body comprising volunteer members and organizations.
  • Non-profit – RDA does not promote, endorse or sell commercial products, technologies or services.

Data, research and technology organizations may also join RDA as organizational members with voting privileges on an Organizational Advisory Board or as more informal affiliates.

RDA members form short-term, very highly focused working groups that make up the heart of RDA. Working groups conduct very specific, 12-18 month efforts that implement specific tools, code, best practices, standards and so forth at multiple institutions. Furthermore, to encourage harmonization and balanced choices, working groups produce detailed case statements that undergo extensive community and technical reviews before they begin. The short timeframe demands a focused, modular approach.

It takes time and community conversation to define the specific modular work conducted by working groups, especially given their short duration. Therefore, RDA also recognizes interest groups that have a broader scope and longer life. Interest groups work to define common issues and interests that ultimately lead to the creation of more focused working groups. Creating an interest group is a simple process of preparing a short charter. 

A 12-member Technical Advisory Board, elected by the membership, is responsible for the overall technical direction of RDA. They also review working group case statements and ensure that the proposed approaches are appropriate, technically viable and not just the pet project of a few people.

An elected council of nine senior, well-respected leaders – or statespersons – maintains the overall vision for RDA. The council reviews working group case statements and public comments on the case statements. They provide an objective overview that gauges overall community consensus and ensures the proposed working group adheres to RDA principles and has a solid adoption plan demonstrating that they are making appropriate social as well as technical choices. Council also reviews interest group charters and ensures that they are within scope of RDA and not conflicting with other efforts.

The globally distributed Secretariat provides coordination and administrative leadership. They help coordinate the operations of the council, boards and working and interest groups, provide overall communications and generally act as the face of RDA.

The RDA Colloquium (RDAC), consisting of representatives from governmental funding agencies, acts as an international steering committee that has provided initial funding support and organizational support for RDA.

A final, critical component is the RDA plenary. Plenary meetings are held every six months and allow the community to come together to conduct business and make progress on their plans and deliverables. The plenary is a forum to show the work of RDA, receive feedback from the broader research and policy communities and to continue to define and build relationships with initiatives and organizations that share the RDA vision.

Current Status and Plans
In this initial start-up phase, the organization and efforts of the RDA have been sponsored by the Research Data Alliance Colloquium (RDAC). RDAC currently consists of government agencies from the United States, the European Commission and Australia. It is likely to expand beyond those countries. RDAC appointed an initial organizing group and is appointing the initial council (currently six members and expanding to nine), but future council members will be elected by the membership as the terms of current members end. The organizing group has spun off several volunteer task forces to organize the rest of the RDA structure in collaboration with the Secretariat and council. The council has appointed an interim Technical Advisory Board of six people. The remaining six members of the board will be elected by the membership later this year. Thereafter, one-third of the board will be replaced in annual elections. A task force, collaborating with RDAC and council, is working to define organizational membership and the corresponding Organizational Advisory Board. A proposal should be out for community comment later this summer.

Despite the nascent structure of RDA, it is growing quickly. Roughly 700 members from 44 countries span all sectors, with 61% of the participants from academia, 11% from the public sector, 21% from the private sector and 8% undetermined. Several working groups have been recognized, and more than a dozen interest groups have formed. The number of working and interest groups continues to grow rapidly. As is their nature, working groups are focused and may be even a bit esoteric. Current working groups are working on specific details like agreeing on core foundational terminology, registering different types of data and identifiers and defining common computer actionable rules. Current interest groups are, of course, more broad-ranging; they address diverse topics including community engagement, legal issues around data sharing, repository certification, agricultural data interoperability and more. A current and complete list can be found at http://rd-alliance.org.

RDA officially launched in March 2013 at the first plenary in Gothenburg, Sweden. Some 240 participants from 31 countries attended. The second plenary will be in Washington, D.C. September 16-18, 2013, and attendance may top 400. The third plenary is likely to be in Europe again. A bidding process is being established for nations and organizations to host further plenaries.

Conclusion
The Research Data Alliance has emerged at a critical time when society is facing many complex problems requiring new and creative uses of diverse data. RDA has garnered the attention of senior researchers, data practitioners, funding agencies and government ministers not only for its timely emergence, but also for its pragmatic approach rooted in theory but focused on getting things done. It is an exciting time to be engaged in information science and technology. We encourage you to join RDA and get involved with its working groups and other structures.

Resources Mentioned in the Article
[1] Star, S.L., & Ruhleder, K. (1996). Steps toward an ecology of infrastructure: Design and access for large information spaces. Information Systems Research 7(1), 111-134.

[2] Edwards, P.N., Jackson, S.J., Bowker, G.C., & Knobel, C.P. (January 2007). Understanding infrastructure: Dynamics, tensions, and design. Arlington, VA: National Science Foundation. Retrieved June 20, 1013, from http://hdl.handle.net/2027.42/49353

[3] Bowker, G.C., Baker, K., Millerand, F., & Ribes, D. (2010). Toward information infrastructure studies: Ways of knowing in a networked environment. In J. Hunsinger, L. Klastrup, & M. Allen (Eds.), International Handbook of Internet Research (pp.97-117). Dordrecht: Springer Science+Business Media. http://dx.doi.org/10.1007/978-1-4020-9789-8_5
 


Mark Parsons is managing director of the U.S. component of the Research Data Alliance and the Rensselaer Center for Digital Society.

Fran Berman is the Hamilton Distinguished Professor of Computer Science at Rensselaer Polytechnic Institute.