Research Data Shared Service Pilot
Short overview and FAQ
Questions
What is it?
Why is Jisc doing this?
What are the gaps in the current provision?
What is the vision and what are the goals?
Who are the stakeholders?
Who are the pilot institutions and why were those institutions chosen as pilots? What are the particular benefits of being a pilot?
Is what is being proposed a single service or multiple services?
How does the service fit into the other offerings in the Research @ Risk portfolio?
Will the Service be Jisc branded out of a single Jisc data centre?
I really need this service—is there scope for non-pilots to be involved?
What is the current status of the pilot and when will the pilot be finished?
What is included in the initial service offering (and what is NOT included / out of scope)?
What about interoperability? Will the service be based on / use commonly accepted standards (for interfaces, API’s Metadata, etc)?
What is the meta-data schema for this service?
What is the “Minimum Viable Product”?
Will ‘Out of Scope’ components be incorporated into the service at some point in the future?
How automated will the service be (metadata extraction for instance)?
When will the Service be rolled out for the wider (Jisc member/subscriber) community?
Is there a possibility that the pilot service won’t progress into as a full service offering? Who takes the go / no go decision regarding taking the pilot into service and when? What happens to data currently in place in the pilot system?
Will people and companies who are not Jisc members/subscribers be able to use the service?
Who is providing services to the pilot? How are those services divided up?
What additional procurement and support is required over and above the 8 lots?
My company isn’t on the list of providers. Can I provide a service?
Who has oversight of the overall project?
What is the ‘University of Jisc’?
What will the service cost and who pays for it?
How do we know that this service will be cost effective?
Is it all or nothing? I have some parts of a solution in place already so do I have to have all the components of service?
Are there choices for individual components within the service?
I want to use a component that isn’t part of the pilot service and isn’t on the planned development pathway. Can I do that?
Where can I find out more about the current status of the project?
What is it?
Strictly speaking, right now, the Research Data Shared Service (RDSS) Pilot is a development project designed to provide a scalable, modular Research Data Management (RDM) service for the HE/Research sector. It’s an ambitious and wide ranging project designed to provide a research data management service to a wide range of users/institutions from green-field sites with no research data management provision to mature sites with a need for one or two services to round out their existing management portfolio. When the pilot has concluded it is intended that it will become a full service offered by Jisc. The progression to service depends on a number of factors, including the technological success of the pilot and the business case for providing an RDSS.
The diagram below (Figure 1—Idealised RDSS infrastructure) shows how an idealised infrastructure might look for a research oriented institution. The pilot service will be addressing some aspects of this infrastructure—some core elements and some elements required/desired by the pilot institutions
Why are Jisc doing this?
The RDSS pilot is part of the Research @ Risk portfolio of Jisc initiatives. These were identified by Jisc members as being of particular concern.
The key drivers for RDSS are:
-
Better research
-
Lack of choice
-
Market gaps
-
Funder mandates
-
Efficiencies of scale
-
Interoperability
At present there is no “solution” that is both easily available and that meets the particular requirements for Universities to enable their Research Data Management to take place.
More effective Research Data Management must happen to comply with Funder Mandates (click here to check funder mandates and here for an overview of mandates), ensure data is not lost, and to realise a whole range of positive benefits
A shared service (in particular one provided by Jisc) offers a number of benefits
-
Cost savings and efficiencies
-
Common approaches and practice
-
Research system standardisation and interoperability
-
Address gaps in the market
-
Others…
Spin off benefits:
-
Linking Cross discipline research and researchers
-
Linking cross institution research and researchers
-
Macro and micro reporting, particularly financial reporting
What are the gaps in the current provision?
Although many individual elements exist in isolation there is a lack of:
-
true integration from data creation through to long term preservation
-
interoperability across systems
-
consistent UX across systems
Preservation is the BIG gap—many institutions are only now starting to address this need, in particular the question of what to keep (and what not to keep) and how long to keep things for.
While there are preservation solutions there is a gap in terms of curating for preservation—tools that allow file format identification, metadata and the creation of archival information packages—data integrity and even emulation.
What is the vision and what are the goals?
The Vision
Researchers shouldn’t need to think (too much!) about Research Data Management
"Visible data, invisible infrastructure"
We need to provide researchers with intuitive, easy to use systems that allow them to publish, archive and preserve their research outputs.
We need to provide interoperable systems to allow researchers and institutions to fulfil and go beyond policy requirements and adhere to best practice throughout the RDM lifecycle.
In short
-
Visible data, invisible infrastructure
-
Efficiency
-
Cost saving
-
Policy Compliance
-
Improved research integrity
Goals
The Main goals for the system are:
-
RDM Policy compliance
-
Increased sector efficiencies: procurement, data re-use, interoperability opportunities
-
Improving the integrity of research
-
Addressing Market Gaps: Integrated RDM system, Preservation Gap, Usability
-
Accelerating Research Data Management in institutions
-
Supporting institutions meet Open Access/REF
The main goals for the pilot are:
-
Proof of concept through the provision of a series of interoperable components that integrate with the systems of the pilot institutions
-
Development of a robust business case that will inform the decision to proceed to a deployed service
Who are the stakeholders?
For the shared service as a whole essentially anyone active in the funding or managing research data:
-
Funders/government
-
HEI Management
-
Research Data Managers
-
Researchers
-
Service/infrastructure providers
For the pilot service the pilot institutions and framework suppliers should also be considered as a particular subset of the stakeholders.
Who are the pilot institutions and why were those institutions chosen as pilots? What are the particular benefits of being a pilot?
The Pilots have been selected to provide a balanced portfolio and cover a range of use cases—from green field, to mature provision requiring only one or two individual components—and a range of sizes—from small and specialist, through research intensive, to large collegiate institutions. They are:
-
Cardiff University
-
CREST - Consortium for Research Excellence, Support and Training (Harper Adams, St Mary’s -Twickenham, UCA & Winchester)
-
Imperial College of Science, Technology and Medicine
-
Middlesex University
-
Plymouth University
-
Royal College of Music
-
St George's Hospital Medical School
-
University of Cambridge
-
University of Lancaster
-
University of Lincoln
-
University of St Andrews
-
University of Surrey
-
University of York
In addition to the overall benefit of ultimately having a RDM system in place pilots also benefit from:
-
Institutionally branded RDM and research object repository solution
-
Support with policy compliance (EPSRC, Open Access, OpenAIRE etc.) and best practice
-
Institutional support to refine RDM requirements and implementation
-
Focus on intuitive user experience and ease of use for researchers
-
Focus on interoperability between institutional and external research systems
Is what is being proposed a single service or multiple services?
The RDSS could be delivered as a single integrated Software as a Service product (and such a solution is still available for those who want this type of provision). However, many institutions have already invested in components that provide them with partial solutions. So RDSS will provide components to fill the gaps that will integrate into these partial solutions along with the required interoperability across system boundaries tools/solutions that will allow seamless and controlled data flow throughout.
It’s also worth noting that the underlying systems do not consist of single supplier modules. For instance, the systems underpinning the preservation module could be provided by one of a number of suppliers. The institutions availing themselves of the RDSS will have control over this choice in their instances.
How does the service fit into the other offerings in the Research @ Risk portfolio?
The project is making links between the rest of the Research @ Risk portfolio, many of which are feeding outputs into the pilot.
Of particular interest are:
-
Business case and costing for research data management
-
Research data spring – projects including preservation developments, data papers etc.
-
Research data metadata
-
Research data metrics for usage
-
UK research data discovery
-
Funder policy guidance
Will the Service be Jisc branded out of a single Jisc data centre?
It is intended that the end users will see the service as an institutionally branded offering, even if it runs on Jisc infrastructure. Early iterations will be cloud based (Amazon Web Services–AWS) using the existing Jisc framework agreements. However, eventually it should be possible to run some or all of the service on alternative infrastructures.
I really need this service—is there scope for non-pilots to be involved?
At present we’re in the alpha development phase, in particular developing a "University of Jisc" test-bed. When there are beta products we will be opening up the testing and use of some of these products to the wider community. Information will be publish on the this website as and when.
What is the current status of the pilot and when will the pilot be finished?
Status
-
The pilots have been identified and appointed
-
The Initial requirements gathering is complete
-
Data Asset Frameworks (DAFs) have been completed/updated and analysed
-
The Metadata approach has been identified and a draft data model developed - see <a href="https://github.com/jiscresearch/sharedService" target="_blank">https://github.com/jiscresearch/sharedService</a>
-
Baseline costing is underway
-
Suppliers have been appointed
-
The initial Technical Architecture and Delivery Proposals report has been published - see <a href="https://goo.gl/tZISrz" target="_blank">https://goo.gl/tZISrz</a>
-
This feeds into the tech requirements of the Platform Statement of Requirements
-
This feeds into the specifying of integrations between platforms (both those on the Jisc framework and those that are not) that need to be developed
-
The output will be a detailed tech architecture.
-
The ‘University of Jisc’ development platform has been commissioned
-
The gathering of detailed requirements with all of the pilots has been undertaken and the Minimum Viable Product (MVP) for the service alpha finalised
What is included in the initial service offering (and what is NOT included / out of scope)?
The scope for the pilot is that of an institutional data repository taking deposits of data from the point of publication, to the preservation, access and storage of data with interoperability to systems outside.
Jisc has services and agreements in other related areas (e.g. Active data storage, Supporting DMP’s, Identifiers through ORCID, Discovery through the UK research data discovery service, usage statistics, and national data services) connections to which will be integrated into the pilot system.
Through the pilot we also hope to test and develop integrations and joins with Jisc shared data centres and Jisc open access services.
It should be noted that at present, even though links from Current Research Information Systems (CRIS) are in scope, providing a CRIS itself is currently out of scope.
What about interoperability? Will the service be based on / use commonly accepted standards (for interfaces, API’s Metadata, etc)?
Interoperability with systems can provide opportunities for efficiencies and ease of use for researchers. In many ways the integration with other existing systems is the key USP for many potential stakeholders. The system will use appropriate standards, preferably open standards wherever required. Only where absolutely necessary will new standards be developed (in consultation with the relevant communities).
The diagram above shows the 3 key underpinning platforms—repository, preservation and reporting—and the key connections to other systems and services. In the first stage we will be concentrating upon interoperability with the systems already in place in the pilot institutions.
What is the meta-data schema for this service?
The Metadata approach has been identified and a draft data model developed which can be seen at https://github.com/jiscresearch/sharedService
What is the “Minimum Viable Product” (MVP)?
Pilot Minimum Viable Product needs
(based upon information gathered during visits to the pilot institutions)
Initial views of what the MVP should cover included:
-
“Easy to use and cost effective archiving, ingest, preservation, repository, reporting and discovery supported that can handle sensitive data”
-
“Robust data storage that has growth ability for active and archive data”
-
“Standard metadata profile - international for interoperability”
-
“Integration with all main CRIS systems and PURE”
-
“Meets REF and funder deposit requirements (supports deposit of REF data output types)”
In short ……EVERYTHING
Alpha MVP
-
At Alpha the ingest of research data into a repository will be supported – it will be discoverable and available for download. Supports institutional compliance in the short term.
-
Data will be passed into a preservation system, which will enact preservation policies on the data and pass it to long term storage. This makes the institution policy compliant in the long term
-
Integration work across internal service components and the most common institutional systems, CRIS systems and institutional repositories.
-
External integrations will include identifier systems, scholarly communications and metrics systems
-
System wide effort on User Experience and Technical Underpinning
-
Further definition of reporting and analytics is underway
Beta MVP
-
Adds functionality for large and sensitive data management and improving preservation of research datasets.
-
The reporting application will be implemented, tested and refined.
-
Additional institutional integrations will include finance, HR and grants data sources, for those institutions without CRIS systems, as well as linking active data storage, such as file sync and share.
-
UX improvements and measures to ensure scalability, robustness and security will be undertaken across the system
Will ‘Out of Scope’ components be incorporated into the service at some point in the future?
Potentially.
In the short term we are focusing on:
-
Systems required by the Pilot institutions
-
Widely used systems
-
Easy to integrate systems
Obviously there will be specialist systems that fall outside these categories. However, we may still consider them for inclusion at a future date. Decisions will be made on the basis of demand and difficulty.
How automated will the service be (metadata extraction for instance)?
As automated as we can make it. A number of tools will be employed to identify file formats, extracted previously entered meta-data, etc. The automated transfer of information such as metadata through the system is one of the goals of the shared service.
When will the Service be rolled out for the wider (Jisc member/subscriber) community?
The pilot is scheduled to run for 2 years until April 2018. A full service should be available to the community by then. However, some components may be released before that date, possibly as beta versions. It is also quite possible that development work could continue past April 2018 to allow for the incorporation of new functionality.
Is there a possibility that the pilot service won’t progress into as a full service offering? Who takes the go / no go decision regarding taking the pilot into service and when? What happens to data currently in place in the pilot system?
It's possible. Ultimately the service needs to be underpinned by a robust business case. If the case can’t be made then the pilot won’t progress. The decision will be taken by the management teams within Jisc.
The possibility that the service won’t progress has been considered from the outset. Should this turn out to be the case then pilot institutions will be assisted in the transfer of their data to alternative systems (if required). There is also the possibility that other stakeholders may wish to continue development work independently of Jisc. As most of the development work should be open source, this should be eminently possible.
Will people and companies who are not Jisc members/subscribers be able to use the service?
We are considering a number of potential income streams which may help fund the service. Once the needs of members have been satisfied it is likely that the service will be made available to non-Jisc members to provide just such an income stream.
Who is providing services to the pilot? How are those services divided up?
There are:
-
8 Lots
-
13 Suppliers
-
5 Consultants
-
7 Platforms
Lots
(figures in brackets show the number of suppliers in that lot)
Lot 1 - Research Data Repositories (4)
Lot 2- Repository Interfaces (6)
Lot 3 - Research Data Exchange Interface (3)
Lot 4 - Research Information and Administration Systems Integrations (1)
Lot 5 -Research Data Preservation Platforms (2)
Lot 6 - Research Data Preservation tools development (2)
Lot 7 - Research Data Reporting (2)
Lot 8 - User Experience enhancements (4)
Underlined are the platform lots, which are existing products that can be installed straight after contracting, the other lots are development lots to provide interoperability, usability and suitability for RDM
Suppliers
-
Arkivum
-
Connexica
-
discoverygarden
-
Figshare
-
Haplo
-
Ken Chad
-
magneticNorth
-
Ocasta
-
Preservica
-
Sero
-
Symplectic
-
University of Edinburgh
-
University of London Computer Centre
Platforms
Lot 1 - Research Data Repositories (4)
-
Discoverygarden – Islandora (open source, based on Fedora)
-
Figshare (proprietary, hosted)
-
Haplo (based on RIM system used at Westminster, open source)
-
Sero - Hydra (open source, based on Fedora)
Lot 5 -Research Data Preservation Platforms (2)
-
Arkivum – Archivematica (open source preservation platform)
-
Preservica – (proprietary, hosted or licenced)
Lot 7 - Research Data Reporting (2)
-
Connexica – CXAIR (proprietary, hosted)
-
Sero – Edges (Also used by Jisc Monitor)
What additional procurement and support is required over and above the 8 lots?
Existing Jisc Agreements | Description of Research Data Shared Service Components |
---|---|
Cloud Storage | To provide cloud storage for the service |
Infinity (and/or Northern) Data Centre | To provide local storage and hosting of services |
Data Archiving Framework | To provide long term archival storage of research dat |
Data Audit (Research Consulting) | To provide the consultation phase for stakeholders in the project, not focused on the final technology solution, for example an audit of datasets, legal and compliance framework, financial and strategic commitment. |
Technical Architect | To provide expert technical advice to the project on the technical architecture of the service, assessment of institutional technical capability and to assist in gathering detailed requirements from institutions and researchers |
Metadata and Interoperability (CLAX) | An examination of metadata specifications and provide advice on identifier systems and interoperability |
Project Management (LM) | To provide project management support and coordinate contract negotiations, facilitate collaboration between suppliers and HEI’s and monitor overall service development. This function will also gather evidence to feed into the business model for the next stage |
Preservation Audit (TBC) | To provide the needs and priorities for preservation tools development |
My company isn’t on the list of providers. Can I provide a service?
Potentially.
At present the providers for the main lots have gone through a selection process to enable them to be framework suppliers. However, if there is a need for a service or product that framework suppliers can’t provide then there may be an opportunity for non-framework suppliers to be approached.
Who has oversight of the overall project?
The core team has a day to day oversight of the development process.
The Research @ Risk Oversight Group has oversight of the Research @ Risk projects, one of which is the RDSS. The RDSS Expert Advisory Group provides specialist advice relating to the project and the direction it is taking. Both of these groups have external (to Jisc) members representing interests in the HE/Research arena.
The Jisc board has overall responsibility.
The Research Data Network (not shown above) provides a conduit for the wider community to feed into the Pilot and to follow developments.
What is the ‘University of Jisc’?
The University of Jisc (UoJ) is a test environment (for most products) hosted on Jisc servers that will allow:
-
Installation of platform products
-
The testing products with dummy (but real) data
-
Developers to start integration work and bespoke development prior to institutional alpha deployment
A Jisc technical team—including devops and a technical integration developer—has been put in place to oversee the operation of the platform.
The environment will include installations of:
- Platform products
- Other commonly used research systems:
- Eprints
- Dspace
- Authentication
- Arkivum Virtual Machines
- Others
We will also be using it to investigate CRIS installations and other systems.
It will also function as an environment where suppliers and Jisc can test their products against common systems, use cases and integrations.
We’re currently procuring the systems/services that will be incorporated into the UoJ and, in some cases, alpha development has begun.
What will the service cost and who pays for it?
As yet we don’t know the cost per instance. One of the reasons for undertaking the pilot is to find out. Similarly, we will be investigating funding and business models to feed into the business case when the pilot is considered for transition to a service.
How do we know that this service will be cost effective?
We don’t… …yet.
Currently we’re:
-
Establishing a base cost (with the pilots) for existing provision
-
Incorporating costing/reporting tools within the RDSS
-
Gathering costs and related data as we go along
It should be, even if only for the economies of scale and procurement that such a service can provide.
Is it all or nothing? I have some parts of a solution in place already so do I have to have all the components of service?
No. You can use as much or as little as you need.
Are there choices for individual components within the service
Yes. Most modules/lots have multiple suppliers underpinning them. Institutions procuring the service will have the opportunity to choose which underpins their version of the service
I want to use a component that isn’t part of the pilot service and isn’t on the planned development pathway. Can I do that?
The RDSS is designed to be modular, interoperable and standards compliant. As long as the desired component is also interoperable and standards compliant with an appropriate Application Programming Interface (API) then it should be possible to incorporate it. It is almost inevitable that there will be some costs associated with incorporating a component that’s outside the initial offering. It’s not been decided how this additional development work would be funded.
Where can I find out more about the current status of the project?
Blog
https://researchdata.jiscinvolve.org/wp/
Research Data Network
This website - http://researchdata.network
The research data network is a people network for anyone interested in Research Data Management. Although it was born from the interest surrounding the shared service, that’s just the starting point. The network holds regular events – to date there have been three (in Cardiff, Cambridge and St Andrews) with more planned.
Research @ Risk
https://www.jisc.ac.uk/rd/projects/research-at-risk
Shared Service Web page
https://www.jisc.ac.uk/rd/projects/research-data-shared-service
Updated less than a minute ago