Research data management services are coming under increasing pressure

More than ever, the proactive management of research data has come to be seen as an essential aspect of research, to:

  • support, validate and provide transparency on new research;
  • encourage and facilitate shareable data.

However, research data management (RDM) services at the institutional level are coming under increasing pressure. The volume of research data and the requirement to make it shareable have grown rapidly in recent years but RDM budgets have not kept pace. This raises questions about how HEIs will deliver sustainable RDM services in future.

Clear economic evidence is required to support the case for sustained investment in RDM. However, to date, little has been provided in the way of clear economic evidence on the costs and benefits of RDM within HEIs. 

Relatively little has been done to quantitatively measure the benefits of RDM

Existing research into the benefits tends to overlook RDM in higher education institutions and instead focuses on: specific data centres or funding institutions; similar but distinct sectors which cover some attributes of RDM, but ultimately cannot be linked the RDM or RDM practice; or only part of the RDM process.

What evidence there is indicates a return on investment (RoI) in the range of 2-12 for three UK data centres and that provision of open data can be worth (or has the potential to generate value of) between 0.4% to 4.1% pa of GDP. Other analyses on making specific institutions’ data more open indicate gains in the form of productivity improvements and time savings.

The available evidence on the use of cost models in HEIs is patchy and varied

Calculating the costs of RDM provision is a better explored area, with many organisations developing their own bespoke cost models for data curation, data archival, or data storage. But there is no clear framework for monitoring and estimating costs, and the methods of measuring these costs are often not standardised across the higher education sector. Despite several examples of cost tools used in the RDM domain, very few publish or share their estimates of the costs of RDM.

A logic map for RDM was developed

Defining RDM to be any activities that fall within the following services within HEIs: data management planning; data catalogues; active data management services; institutional data storage and repositories; guidance, training and advocacy; and development of RDM policy and strategy; and using an outcome approach-based framework, we present a logic mapping that categorises RDM into five main elements: inputs; activities; outputs; intermediate outcomes; and impacts.

This logic mapping identifies and organises the costs and benefits of RDM:

  • costs are accrued over the inputs and activities elements;
  • benefits flow from the intended results generated under outputs, outcomes and impacts.

This can then be used as the basis for an idealised cost-benefit analysis where the monetary values of costs and benefits are measured and a net benefit derived as the difference between total costs and benefits. 

Challenges to measuring the costs and benefits of RDM

In reality, any approach to measuring the costs and benefits of RDM faces the following practical and conceptual challenges:

  • Distinguishing RDM costs and benefits from wider research;
  • Developing a counter-factual against which the results of proactive RDM can be compared;
  • Attributing impacts to RDM.

Key requirements of a costings framework

Based on the evidence reviewed, we recommend that any costings framework should: 

  • Include the costs of live data;
  • Ensure costs for the past, current and future management of data are covered;
  • Consider data management costs over a fixed time horizon;
  • Be clear on what level the costings represent e.g. project level, institutional level;
  • While costings may be bespoke or tailored for specific aims, it should be possible to map them to a common framework;
  • Differentiate between fixed, variable, semi-variable and step costs to aid sensitivity and scaling calculations.

Two approaches to measuring costs

To measure the costs of RDM provision we propose a top-down approach and a bottom-up approach incorporating these key requirements.

The top-down approach identifies broad RDM services (e.g. data management planning, data cataloguing and registry services) and the various labour and capital costs associated with each service. The bottom-up approach adopts an activity-based costings framework, which breaks the broad services into more detailed activities (e.g. ingest, access, acquisition, disposal) and identifies the various labour and capital costs associated with each activity. In both cases, assumptions about the expected volume of data are used to scale up to derive a total cost at the institutional level. 

While the more accurate costings derived from the bottom-up method make it the ideal approach, we consider it less feasible to implement at the moment because it is more resource intensive and demanding with regard to data requirements. We expect that cost data for the top-down approach can be drawn mostly from existing cost data collected by institutions. And so we consider a top-down approach is more feasible for HEIs to implement at the moment, but in time they move closer to a bottom-up approach.

Key considerations for benefits analyses

From the analysis of the tools and techniques used to measure benefits, we recommend that any future analysis of benefits should

  • Ensure that the framework of RDM captures all benefits of proactive RDM, including management of live data; 
  • Recognise that different tools and techniques are needed to estimate different benefits;
  • Prioritise direct and measurable economic benefits; 
  • Disaggregate the benefits from different outcomes to obtain precise valuations and enable easier cross comparisons;
  • Undertake econometrics evaluation to derive the scale of wider impacts (to start linking intermediate outcomes with likely wider outcomes, given the uncertainty surrounding the extent to which wider benefits can be attributed to RDM).

A general approach to measuring the benefits

For measuring the benefits, therefore, we suggest treating the intermediate outcomes of RDM as the direct benefits of RDM. Within these benefits we distinguish between those that can be measured in financial terms and those that cannot.
The techniques and type of metrics suggested to measure the value of each benefit are varied. A list of various metrics and possible is proposed. It is envisaged that these will be refined in time, following attempts to measure the benefits of RDM.
The wider national-level long-term impacts are separated out. The most likely and feasible approach to measuring these would be through case studies and interviews.
Econometric techniques are suggested to link the different elements of benefits together, but only if the specification is robust.
As with the costs framework, however, a crucial question concerns the feasibility of implementing such a framework given existing monitoring structures within HEIs.