DMMCore: Data and Model Management Core for ERASysAPP & Europe
Principal Investigator / Supervisor
Professor Carole Goble
Professor Jacob Snoep
The University of Manchester
In the course of the ERA-Net EraSysAPP and European research infrastructure project ISBE, three national funding agencies have agreed to support a Data and Model Management Core Project - DMMCore - for the European life science research community. This joint action aims at establishing an internationally sustained Data and Model Management service to the European systems biology community seeded by research projects funded by ERASysAPP. Previous funding programs (SysMO and SystemsX.ch) dedicated support projects (SysMO-DB and SyBIT) have been set up to create the necessary tools and resources, and to interact with the research projects to assure proper data and model management. We are building on the experience and expertise of those projects to: 1) Develop a standards-based Data and Model Management Platform by combining the openBIS and SEEK platforms (openSEEK) with a Core Tool Pool of community software and archives. 2) Establish a sustainable European infrastructure using the platform, including a community Facility called EuroSEEK to host projects that do not want to operate their own local openSEEK platform. We will establish a Data & Model Management Network of community actions of experts, users and developers, sustainability mechanisms for the EuroSEEK Facility that federates national activities in countries in addition to the DMMCore countries; and a SysBioDMM Foundation to manage future membership funding of the resources and the community networks. We will actively engage in standards and work with standards and policy setting organisations. 3) Provide support and operations for the EraSysAPP projects seeding the European Infrastructure including: support for projects to found their local data management solution based on openSEEK and project-specific data analysis pipelines; a Support Team for technical curation, policy making, advice and training; and a PALs focus group of project representatives to work with the delivery team.
This project, in partnership with colleagues in HITS, Heidelberg and Sybit, Switzerland, aims at establishing an internationally sustained Data and Model Management service to the European Systems Biology community. Many projects in Systems Biology have been funded worldwide. However, it is often not clear how the results are made available to further research. Some result data sets or models are stored in public resources, but not all. Other repositories are ripe for a greater exploitation world-wide. Funding agencies now expect new projects to publish their results using open access and to provide a data management and data sustainability plan to assure that their investment has the desired impact. However, the researchers often have neither the means nor the knowledge necessary to publish and maintain their results according to the necessary standards so that they remain available beyond the end of their project. The results as requested by the publishers are often insufficient to be able to reliably reproduce results. Models and standard operating procedures are often not available or are not sufficiently described to be reproducible. This introduces inefficiencies, as the limited reusability and impact of project results on future projects leads to reinventions of methods and tools and slows the pace of scientific progress. Systems Biology projects often have large and heterogeneous data sets, always have mathematical models based on experimental data. Such projects usually have researchers from various disciplines at many locations collaborating, where direct communication is often difficult not just because of the locality but also due to different semantics among disciplines. A shared software interface with transparent data and links to models is essential to allow for interdisciplinary research and to interact among research groups in Systems Biology. The iterative cycle between experiment and models that is typical in System Biology projects, mandatesa good versioning system for experiment and model versions as well as investigation-study organisational structures such that the project workflow is clear and reproducible. Furthermore, a strict adherence to community standards and standard operating procedures is important for linking data and models created by different project members. A management platform needs to support experimental workflows and model and data linkages, with good tools for data and model annotation and adherence to standard formats of representation. In order to have all the information for a reproducible scientific process, data for model construction and model validation need to be clearly separated and all data, models and Standard Operating Procedures must be made available for download. Dedicated software platforms and tools are necessary to lift scientific research in Systems Biology up to industrial standards of reproducibility and quality control. In this project we will establish a sustainable European data and model e-infrastructure for the European systems biology community with a long-term business model. We will found a European wide Data & Model Management Network to support and promote data and model management and capacity building, using the EraSysAPP ERANet as a seed. We will disseminate knowledge and standards, coordinating with related e-infrastructures such as ELIXIR and ISBE, and interacting with community related policy and standards settings. We will develop the necessary toolset and set up a data and model management platform for Systems Biology projects by combining and further developing/improving the well-proven software platforms SEEK and openBIS with a pool of public tools and resources, working closely with both the user and the developer communities. Finally, we will establish a sustainable training programming on the use and development of the platform and on data/model management practices for Systems Biology.
Systems approaches to biology have the potential to transform biology and medicine and provide economic benefits through, for example, industrial biotechnology amd applications that detect drug project failure earlier and speed up therapeutic development: it is estimated that drug discovery costs could be reduced by $390m (£225m) development time to market shortened by three years (Strategy for UK Life Sciences, BIS). The UK alone has more than 50+ research groups and has invested £300m in awards, with an expectation of greater investment. This project proposes to raise the capacity and capability of Systems Biology asset management by establishing a management platform for data, models and SOPs, enabling access to and management of results, radically improving the availability of the results and raising the quality of metadata practices. Thus the project has the potential to produce impact across Systems Biology and its application, and across a broad range of the life sciences through better management, quality control, and adherence to standards leading to greater reproducibility and reusability of models and data. Our work plan makes a direct contribution towards making the widest possible economic and societal impact of Systems Biology , with dedicated programmes of community engagement and outreach. Specific impacts include: - An off-the-shelf management platform (openSEEK) for European researchers exploiting prior developments. We have the potential to be the management system for SysBio projects worldwide. The platform has already made an impact internationally with instances in Europe, Russia and South America. - A EuroSEEK hosted resource built using openSEEK to retain, find, track, and share European research results, thereby ensure a greater impact of Europe's research programmes internationally and to business, and facilitating data and knowledge exchange between researchers. - Raised accessibility and profile of European software and data invesents through their integration into the pan-European resource and training programme. - An established Data and Model Management Facility Network of European Centres using: openSEEK as the flagship platform; EuroSEEK as a directory of researchers and research; and centre members as skilled data stewards. The network will bootstrap a European Research Infrastructure. - Adoption of the platform, resources and practices by developing a Knowledge Network of early and mid-career researchers, User and Developer Forums and a Systems Biology Developers Foundry. - Raised capacity through a training programme with bootcamps, workshops etc, through partnerships with organisations like Software Carpentry and GOBLET. - Partnerships with scholarly communication stakeholders (publishers and open access repositories) to improve data and model publishing, incentivize researchers and yield better management practices. - Active participation and leadership in domain specific standards bodies (e.g. COMBINE), selected communities (e.g. Research Data Alliance, BioSharing), and adaptation of our platforms and training to emerging standards and selected Research Infrastructures (e.g. ISBE, ELIXIR,EUDAT). The greatest challenge to impact is sustaining the DMMCore outcomes and activities for the long term. We will work with funding agencies and the community to develop funding models, infrastructure support policies and contribution mechanisms. We will work with the UK's Software Sustainability Institute on sustainability strategies for the software and software training, and establish a not for profit Foundation as a focus for funding, running the EuroSEEK facility and open source software development.
Research Committee A (Animal disease, health and welfare)
Systems Biology, Technology and Methods Development
X – Research Priority information not available
Data and Model Management Core (DMMCore) 
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list