Award details

Using reference-assisted chromosome assemblies to study chromosome structures and evolution in vertebrates

Principal Investigator / Supervisor Dr Denis Larkin
Co-Investigators /
Institution Royal Veterinary College
DepartmentComparative Biomedical Sciences CBS
Funding typeResearch
Value (£) 173,638
TypeResearch Grant
Start date 05/09/2013
End date 04/10/2015
Duration25 months


A novel in silico approach will be applied to predict the order of scaffolds in chromosomes of species sequenced with the NGS techniques. This will include the alignment of scaffolds to existing whole-genome assemblies ("reference genomes"), algorithmic prediction of the most probable organization of a common ancestor for two genomes followed by ordering of ancestral blocks in newly sequenced genomes basing on the lineage-specific rearrangements found within its scaffolds. The chromosomal reconstructions will be verified using telomeric and centromeric sequences reconstructed from the NGS data of the newly sequenced genome. Suboptimal solutions will be searched for the chromosomes that will have structural issues after the verification performed. Corrected RACA genomes from multiple species will be used to search for a support of fragile breakage models of the chromosome evolution as well as for detection of minimum blocks of genes in vertebrates that cannot be disrupted in evolution. We will also explore the mechanisms of the chromosomal rearrangements by analyzing the evolutionary breakpoint regions for enrichment for the lineage-specific features, such as retrotransposable elements, genes, SNPs and CNVs. For the whole-genome set we will analyze sequence features that mark out genomes of different clades (orders, families) as distinct from the genomes of other groups. This will be especially important for gene networks and other genomic features that contribute to the agricultural importance of some species and clades.


Genomes contain genes that encode proteins that build organisms. In the course of evolution genomes change and these changes affect genes by changing the time when proteins are formed or even leading to formation of new genes or death of old genes. These events together form one of the sources of variations used by the natural selection to form new species or for species adaptation to the environment. Complete sequencing of a genome refers to the identification of the sequence of nucleotides along chromosomes. To understand what mechanisms drive changes in chromosome structures in different species and how this affects formation of new species or an adaptation of existing species to changing environment we will reconstruct complete chromosome structures of newly sequenced species using an novel algorithm called "reference-assisted chromosome assembly" or RACA. This algorithm compares sequenced parts of one organism' genome to existing complete chromosome assembly of another and reconstructs chromosomes of their putative common ancestor. Then it uses parts of the newly sequenced genome and searches for the differences between the ancestral organization of chromosomes and the organization proposed by parts of chromosomes that are generated for the organism. At the final step it organizes parts or ancestral chromosomes according to the order proposed by sequence scaffolds. In the research proposed in this proposal we will develop several algorithms to verify these reconstructions by looking at the specific features of chromosomes like "telomeres" - chromosome ends and "centromeres" - important for cell division. These structures contain specific sequence features that could be reconstructed from the sequence data produced during sequencing projects. By detecting positions of these features in the reconstructed chromosomes we will be able to check how close the structure of reconstructed chromosomes is to real chromosomes in the species of interest. If there are issues,we will adapt the RACA algorithm to improve the assembly. In the next step we will use RACA-generated chromosomes to investigate mechanisms driving chromosomal changes at the DNA level. We will check if the distribution of chromosome parts that are not rearranged in all genomes included in our analysis can be explained by the random breakage of chromosomes in evolution, or if there is a selection against chromosomal rearrangements in some parts of a vertebrate genome. If we analyze a large set of species we might be able to find "built blocks" of mammalian, amniote, or vertebrate genomes that cannot be rearranged without a lethal effect for the organism. Evolutionary breakpoint regions are regions of chromosomes where chromosomes were broken and then rejoined in a different combinations or orientation in evolution. We will use multiple RACA genomes to investigate what features of the genomes are driving these events. An important question to answer is "which genes would more likely be affected by these evolutionary events?" Previously we demonstrated that the evolutionary breakpoint regions are enriched for the genes that are associated with the lineage-specific features. In this project we will perform bioinformatics analysis of these intervals in an attempt to classify lineage-specific changes that happened in ancestral genomes of some lineages leading to the formation of their specific traits chosen by natural selection, e.g., formation of the rumen in ruminant species. Our hypothesis is that the changes in ancestral genomes of the livestock species will be connected to those features of the species that made them attractive source of proteins for humans. Therefore, detection of these ancestral changes is an important step for improving genetics of these species as it will identify best gene and other targets for future artificial selection and breed improvement.

Impact Summary

We will detect chromosomal structures in mammalian species and will compare their genome organization to the genome organization of other species, including human. The outcome of our programme has a potential to influence the UK and world economy, health, and services. Impact on health and biomedicine: a)The outcome of our research will be used to identify animal models for human genetic disorders by selecting species with disease phenotypes similar to human phenotypes and with the same as in human genome organization in homologous genome intervals. An important advantage of our work is that we will produce ordered sequence maps for all genomes, an absolute requirement for a good animal model. Therefore, the outcome of our research will permit the selection of better animal models for testing human medicine than traditionally used mice. b)Another part of our studies potentially connected to the quality of life and health is a study of the mechanisms of evolutionary chromosomal rearrangements. We will investigate what makes some regions of chromosomes in meiotic cells fragile in evolution. Our results in this area could have influence on studies of cancers in humans that are accompanied by rearrangements of chromosomes in somatic cells. Our previous studies suggest that a correlation exists between the regions of meiotic instability in evolution and mitotic instability in cancer cells. This project will extend knowledge of the mechanisms of instability and our results could be used for predicting chromosomal regions that could to be rearranged in human cancers. Impact on economy and services: Impact of our research on economy in the UK and other countries could be achieved by using our findings of the lineage-specific genome changes in livestock species for making better strategies for improving efficiency of agriculture and decreasing its negative effect on global climate. For example, lineage-specific changes found in all related species (e.g., ruminants) are likely to control the traits that made these species attractive for domestication. Therefore, the unique features of different genomes will form a dataset that can be explored for genes or other targets for improvement of economically or environmentally important traits in livestock species, such as meat quality in cattle and sheep or green house gas emissions from all ruminants. Our data should allow a reduction in expensive genome-wide association studies using hundreds of animals without a guarantee that the gene of interest will be detected. Instead, it will be possible to generate an explicit list of all lineage-specific changes that are ready to be explored for the markers associated with the particular trait of interest. Eventually, the cost of QTL hunting can be significantly decreased and efficiency increased. We are already exploring the possibilities of using comparative genomics studies to understand the genetics of lineage-specific systems, such as the rumen in order to decease the green house gas emissions from livestock species. Delivering highly skilled people: We will provide the postdoctoral research associate appointed to this project with the opportunity to learn the cutting edge methods of bioinformatics and laboratory work related to genome sequencing, assembly, and genome analysis, therefore improving his/her professional skills and chances of a successful career in academia or industry in areas related to bioscience. Distribution of knowledge: To make our results widely known we will publicise our research results to the widest possible audience through publications in the scientific journals, conference presentations, and Internet websites. The list of the journals we expect the results of this programme to be published is presented in the academic beneficiaries section. We will work close will local business to distribute the knowledge generated by this work among businesses working on animal production and breeding.
Committee Research Committee C (Genes, development and STEM approaches to biology)
Research TopicsStructural Biology
Research PrioritySystems Approach to Biological research
Research Initiative X - not in an Initiative
Funding SchemeX – not Funded via a specific Funding Scheme
terms and conditions of use (opens in new window)
export PDF file