Best Practice for Data Managment

Principles of best practice for data and information management recommended by the International Council for Science (ICSU)

Active attention to the principles outlined in this guide will improve science by making the data and information that scientists use more readily and reliably available in both the short term and the long term.Each point of best practice is expressed as an active verb (e.g. appoint, use, exploit) to focus attention on the action that needs to be taken.

1. Policy

  • Document early the reason(s) for the data policy and the policy itself, and make documents available
  • Articulate the desired outcomes of the data policy.
  • Identify and be explicit about the benefit/cost ratio of professional data management.
  • Ensure that guidelines for participation are easily accessible by encouraging open access to data policies, practices and experiences.

Examples

  • ICSU World Data System data policy, available at http://www.icsu-wds.org/organization/data-policy
  • International Polar Year data policy, available at http://classic.ipy.org/Subcommittees/final_ipy_data_ policy.pdf
  • OECD Principles and Guidelines for Access to Research Data from Public Funding, 2007, available at http://www.oecd.org/dataoecd/9/61/38500813.pdf
  • Panton Principles for open data in science, see http://pantonprinciples.org/
  • Creative Commons licences, available at http://creativecommons.org/choose/

2. Governance

  • Ensure that data management is an integral and funded part of project planning, approval and performance measurement.
  • Appoint expert advisory groups where necessary and charge them with defined tasks.
  • Exploit major international science conferences and events as dates/locations to hold meetings, and use these meetings to encourage interactions between scientists and data/information professionals.
  • Acknowledge the different skills and roles required in professional data and information management.
  • Ensure open, online access to all minutes of meetings and decisions taken.

Examples

  • The core agreement for the Worldwide Protein Data Bank, 2003, available at http://www.wwpdb.org/ wwpdb_charter.html
  • The Intergovernmental Panel on Climate Change structure and working groups, see http://www.ipcc. ch/working_groups/working_groups.htm

3. Planning and organisation


Consider the advantages and disadvantages of distributed versus centralised data repository models in the light of user needs.

  • Use service-based data access methods.
  • Exploit what already exists for data management.
  • Data infrastructure should be completed, ready and available in time for its use by scientists in research projects. Incorporate user feedback into all aspects of the data management lifecycle. Example
  • GenBank, the annotated collection of all publicly available DNA sequences, see http://www.ncbi.nlm. nih.gov/genbank/GenbankOverview.html

4. Standards and tools


Use international standards (e.g. ISO, OGC, XML, GML) where possible, and if not possible then base domain-specific standards on international standards. Provide tools to support the implementation of the standards used, including documentation on how to use the project data.

Examples

  • Dublin Core Metadata Initiative, available at http://dublincore.org/documents/dces/ ISO 19115 for geographical information and services, available at http://www.iso.org/iso/catalogue_ detail.htm?csnumber=26020
  • Open Geospatial Consortium standards and specifications, see http://www.opengeospatial.org/
  • standards International Virtual Observatory Alliance, documents and standards, available at http://www.ivoa.net/ Documents/

5. Data management and stewardship

  • Minimise uncertainty at all phases of the data lifecycle, including for example working with manufacturers to avoid device dependency for data and information.
  • Embrace science-programme and project-level data management planning.
  • Ensure that documented plans for long term stewardship of data exist.
  • Implement a plan for formal process for data and information selection and appraisal.
  • Produce a plan for data stewardship at the outset of a project or programme, not as the last item in the plan.

Examples

  • International Polar Year Data and Information Service, see http://ipydis.org/index.html
  • Research Information Network, stewardship of digital research data – principles and guidelines, 2008, http://www.rin.ac.uk/our-work/data-management-and-curation/stewardship-digital-research-dataprinciples- and-guidelines

6. Data access

  • Minimise the burden on the providers of data.
  • Provide a single portal for user discovery from distributed sources of information.
  • Implement open access policies where appropriate.

Examples

  • GEO portal, see http://www.geoportal.org/web/guest/geo_home
  • Ocean Data Portal, see http://www.oceandataportal.org/
  • OECD Guidelines and Principles for Access to Research Data from Public Funding

Extracts from all 13 OECD Guidelines and Principles:

A. Openness
Openness means access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.

B. Flexibility
Flexibility requires taking into account the rapid and often unpredictable changes in information technologies, the characteristics of each research field and the diversity of research systems, legal systems and cultures of each member country.

C. Transparency
Information on research data and data-producing organisations, documentation on the data and specifications of conditions attached to the use of these data should be internationally available in a transparent way, ideally through the Internet.

D. Legal conformity
Data access arrangements should respect the legal rights and legitimate interests of all stakeholders in the public research enterprise.

E. Protection of intellectual property
Data access arrangements should consider the applicability of copyright or of other intellectual property laws that may be relevant to publicly funded research databases.

F. Formal responsibility
Access arrangements should promote explicit, formal institutional practices, such as the development of rules and regulations, regarding the responsibilities of the various parties involved in data-related activities. These practices should pertain to authorship, producer credits, ownership, dissemination, usage restrictions, financial arrangements, ethical rules, licensing terms, liability, and sustainable archiving.

G. Professionalism
Institutional arrangements for the management of research data should be based on the relevant professional standards and values embodied in the codes of conduct of the scientific communities involved.

H. Interoperability
Although science is becoming a highly globalised endeavour, incompatibility of technical and procedural standards can be the most serious barrier to multiple uses of data sets. Access arrangements, should pay due attention to the relevant international data documentation standards. Member countries and research institutions should co-operate with international organisations charged with developing new standards.

I. Quality
The value and utility of research data depends, to a large extent, on the quality of the data itself. Data managers and data collection organisations should pay particular attention to ensuring compliance with explicit quality standards.

J. Security
With regard to guaranteeing the integrity of a data set, every effort should be made to ensure the completeness of data and absence of errors. With regard to security, the data, along with relevant meta-data and descriptions,should be protected against intentional or unintentional loss, destruction, modification and unauthorised access in conformity with explicit security protocols.

K. Efficiency
One of the central goals of promoting data access and sharing is to improve the overall efficiency of publicly funded scientific research to avoid the expensive and unnecessary duplication of data collection efforts. Consideration should be given to descriptions of good practice, data selection and appraisal, cost-benefit analysis of archives and incentives to professional data management.

L. Accountability
The performance of data access arrangements should be subject to periodic evaluation by user groups,responsible institutions and research funding agencies.

M. Sustainability
Due consideration should be given to the sustainability of access to publicly funded research data as a key element of the research infrastructure. Research funding agencies and research institutions should consider the long-term preservation of data at the outset of each new project, and in particular, determine the most appropriate archival facilities for the data.