Understanding protein multi- and trans-localisation at the full proteome level
Principal Investigator / Supervisor
Professor Kathryn Lilley
Dr Laurent Gatto
University of Cambridge
Localisation of proteins inside cells is of paramount importance to study their function, refine our comprehension of sub-cellular process and organisation, and understand the effect of perturbations at the sub-cellular level. Various dedicated experimental designs based on biochemical separation and quantitative mass-spectrometry have been described and refined over the years, in particular the recently published hyperLOPIT technique. Using this groundbreaking technology, we identified the localisation of over 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations. Using such data, we will develop the next generation of statistical learning tools for spatial proteomics to reliably identify proteins residing simultaneously in multiple sub-cellular niches (multi-localisation) and proteins changing localisation upon perturbation (trans-localisation). To do this, we will rely on mixture modelling of the quantitative protein profiles. (1) Deconvolution of these mixed profiles will enable us to reconstitute the individual localisations of the proteins and (2) comparison of (possibly mixed) profiles among multiple conditions will enable us to identify changes of localisations. We will support such analysis at the peptide level to identify multi- and trans-localisation events for isoforms. Finally, we will support our core statistical learning infrastructure with dedicated, interactive visualisation applications to enable direct and easy exploration of the complex spatial localisation patterns identified.
In biology, localisation is function. Cells display a complex sub-cellular structure, where each of these niches are characterised by specific biochemical conditions and fulfil dedicated functions. A protein must be localised to its intended sub-cellular niche to meet its interaction partners and be functionally active. Hence, being able to systematically measure the locations of proteins, and in particular the full proteome, a field coined spatial proteomics, is of major interest in cell biology. To further depict an accurate view of the spatial sub-cellular landscape of proteins, they are known to display more than one sub-cellular location, and to traffic between different such niches. The former phenomenon is termed multi-localisation and the second one, whether initiated by normal biological triggers, pathological cellular states, or external stimuli such as changes in the cell nutrients or effect of a drug, is called trans-localisation. Finally, the mis-localisation of proteins have been associated with cellular dis-function and diseases such as cancer. The most information-rich datasets for proteome-wise spatial proteomics are generated using high accuracy mass-spectrometry, a technique that allows to identify and quantify the proteome content in complex biological samples. These datasets are high quality rich sources of data that have been mined using a variety of robust supervised statistical machine learning methods which have shown to yield valuable protein-organelle predictions. In particular, the applicants recently published hyperLOPIT, a technological advance enabling to obtain exquisite spatial resolution. Using this groundbreaking technology on mouse embryonic stem cells, they identified the localisation of 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations. In this proposal, we aim to complement contemporary spatial proteomics data with state-of-the-art statistical routines to reliably identify multi- and trans-localisation events at the full proteome level. These new tools, which will complement our existing open-source spatial proteomics suite of software, will enable the proteomics and cell biology community to mine spatial proteomics data to new depths, identifying subtle yet biologically important patterns such as proteins with mixed localisation and proteins that change localisations upon perturbation, in a robust and statistically sound way. We will also develop dedicated visualisation platforms to highlight the outputs of our analysis pipelines and enable interactive exploration of the multidimensional spatial data. We will apply these tools ourselves on a wide range of spatial proteomics datasets from various different biological systems of interest. To guarantee broad exposure of our work, the datasets we will analyse and the spatial patterns we will infer will further be disseminated through community databases, in particular the SpatialMap.org online resource.
Research Aim The prime and novel aim of this project is to develop computational methods and software tools to reliably identify protein dynamics at the sub-cellular level. This research proposal will enable us to tackle more fine-grained, and biologically relevant spatial patterns, and in particular changes in cellular state upon perturbation of the cell. Who will benefit from this research? The main beneficiaries of this project is the proteomics community. There is a long over due requirement to develop appropriate methodologies to analyse sub-cellular protein dynamics, as exemplified by letters of support by not only very productive collaborators of the applicants, but also from several of the top organelle proteomics laboratories in the world. Moreover, the cell biology community, both academic and within the pharmaceutical sector will also benefit as this proposal underpins the interface of modern 'omics technologies and more classical cell biological methodologies. Computational biologists will also benefit from the freely available open-source open-development statistical analysis methods, normalisation strategies and machine learning methods that will form part of this novel pipeline. Our work is targeted to experimentalist users who will use our tools to analyse their data, as well as computational scientists who want to re-use or adapt our methods and software infrastructure to new topics. How will they benefit from this research? The software suite that will be a directly output from the proposal will have multiple benefits to the proteomics and cell biology communities by delivering a novel framework to give users the means to analyse changes in protein sub-cellular dynamics, bringing spatial proteomics data analysis to a whole new level. The statistical methods will be made available for the statistical programming environment R and the Bioconductor project and will interoperate with existing complementary software. Our novel methods will no doubtbe applicable in other 'omics areas of research due to the inherit cross-disciplinary nature of computer science, statistics and machine learning that underpins many areas of computational biology. The project will contribute knowledge and scientific advancement in the form of the dissemination of data and improvement of the analyses of complex multivariate data to facilitate interpretation and understanding of relevant biological processes. (i) Fully characterised organelle proteomics datasets will be deposited in publicly accessible databases, (ii) analysis methodologies will be documented and distributed with software releases to facilitate application of our methods to new datasets and use cases, (iii) data with also be made available through the existing R Bioconductor data packages, and (iv) also available through the online SpatialMap.org resource where users interactively view and share data. The research staff will benefit from the multi-disciplinary research environment and extend their national and international research network through on-going collaborations. What will be done to ensure that they have the opportunity to benefit from this research? The algorithms and tools developed in this proposal will be implemented in the R statistical programming environment and will be deposited to the Bioconductor suite of bioinformatics software. The algorithms will be implemented as independent modules that will be contributed to and compatible with the current pRoloc analysis framework, to form a freely available open-source toolbox for the analysis of spatial proteomics data. It is envisaged that both computational and biological outputs will be written as manuscripts and will be submitted to high impact journals with large general readership. KSL, LG, WH and OK are invited to give numerous talks at top proteomics and computational conferences world wide, thus they will endeavour to publicise the work described here at such events.
Research Committee A (Animal disease, health and welfare)
Technology and Methods Development
X – Research Priority information not available
Tools and Resources Development Fund (TRDF) [2006-2015]
X – not Funded via a specific Funding Scheme
I accept the
terms and conditions of use
(opens in new window)
export PDF file
back to list