Earlham Institute launches first CyVerse-UK hub for 'big data' analysis
We establish the first UK dedicated high-performance computing (HPC) cluster for international data portal ‘CyVerse’ – providing free, open-source genome analysis for ‘big data’ research.
Genomics is increasingly a big data science as now commonplace high-throughput technologies support faster, cheaper generation of data analysis. This enables potentially exciting breakthroughs as researchers can unearth previously hidden patterns and make new discoveries of biological significance.
However, the scientific community struggles to take full advantage of the data generated because of a lack of computing resource, appropriate support, and technical skills. Additionally, bioinformatics tools generated during research projects to test and validate biological hypotheses often remain limited to prototype form and can only be used by those with computational expertise.
Therefore, to undertake modern science when faced with a plethora of tools and datasets, researchers need to be able to efficiently store and access datasets, models, and analysis tools, ideally hosted in different global locations to facilitate international projects – this is where CyVerse can help.
As an international collaboration between hardware and middleware engineers at EI, support staff in the Norwich Research Park Computing Infrastructure for Science (NRP CiS) team, University of Arizona, Texas Advanced Computing Centre and Cold Spring Harbor Labs, CyVerse UK provides free, large scale, computing facilities and data storage designed for life scientists.
Lead Engineer of the CyVerse UK team Erik van den Bergh, said: “Establishing the first CyVerse node outside of the US represents a vital hub in the UK for data analysis and management. CyVerse UK can provide free HPC facilities for all UK scientists as well as allowing integration of UK apps and pipelines into the wider international CyVerse ecosystem.
“CyVerse provides an intuitive web interface, Discovery Environment (DE), where scientists can upload data and run analyses. While this resource is hosted in the US, the DE can automatically run tools hosted in the CyVerse UK platform, giving geographical advantages to data access speed, analysis time, and data placement policy.”
CyVerse UK currently hosts two open-source apps and a new virtual machine environment. Gwasser (Ben Ward, Clark Group) is a statistics pipeline which performs Genome-Wide Association Studies for single phenotypes. Mikado (Luca Venturini, Swarbreck Group) is a lightweight Python pipeline to identify the optimal set of data readings from multiple transcript genomics assemblies. Both apps have been used for the analysis and recent publication of the allohexaploid wheat genome; a crop genome that is paramount in tackling the societal challenge of global food security.
The Polymarker pipeline will soon also be available to scientists to create efficient SNP genome assays in wheat, together with a modified ‘Tuxedo suite’ app developed by the University of Liverpool which executes a series of pipelines for RNA-seq analysis. CyVerse UK’s robust virtualisation platform will also provide back-end data services and web hosting for the COPO and Grassroots Genomics projects.
The CyVerse UK node hardware and software environment has been set up and deployed by the core CyVerse UK team (Erik van den Bergh and Alice Minotto) in the Davey Group, Tim Stitt (Scientific Computing), and NBI Scientific Computing. The CyVerse UK project is a BBSRC-funded collaboration between the EI, The University of Warwick, The University of Nottingham and the University of Liverpool.
About Earlham Institute
The Earlham Institute (EI) is a world-leading research institute focusing on the development of genomics and computational biology. EI is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from Biotechnology and Biological Science Research Council (BBSRC) – £6.45M in 2015/2016 – as well as support from other research funders. EI operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.
EI offers a state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience. Additionally, the Institute offers a training programme through courses and workshops, and an outreach programme targeting key stakeholders, and wider public audiences through dialogue and science communication activities.
Tags: genomics data crops Earlham Institute international collaboration press release