Bioinformatics

The analysis of palaeogenomic data presents several challenges compared to data from modern samples. Palaeogenomic data is metagenomic, including sequences from the animal of interest plus all contaminating organisms which have colonised the bone postmortem. In addition, ancient DNA molecules are highly fragmented and acquire nucleotide misincorporations as they degrade over time.

At the Pleistocene Genomics Lab, an important part of our work is the development of new computational methods for the analysis of palaeogenomic data. For example, we have designed software for simulating ancient DNA data which allows the performance of mapping algorithms to be assessed (see Taron et al. 2018). We also developed a novel test for geneflow directionality for palaeogenomic data (see Barlow et al. 2018). You can find out more about all out software here.


Consensify error reduction

In a recent project, we developed a method for reducing error rates in palaeogenomic datasets. Palaeogenome datasets tend to be low coverage, and a standard practise across the field of ancient DNA has been to convert raw sequencing data into “pseudohaploid” sequences by sampling a single nucleotide from the read stack at each position of the the reference genome. This removes any bias resulting from differential coverage among datasets, but a consequence is that pseudohaploid sequences will generally have much higher rates of error compared to standard genotype or consensus calling methods. We developed Consensify (See Barlow et al. 2018), a new method for generating pseudohaploid sequences with reduced error rates. We find that using Consensify can improve the outcome of a wide variety of downstream analyses, including phylogenetic analysis, population clustering analysis and admixture testing. If you are interested in trying out Consensify, please visit our software page.

Effect of Consensify on phylogenetic analysis of genomic datasets of cave, brown and polar bears. For uncorrected data, the palaeogenomic datasets have artefactually extended terminal branches as a result of abundant errors. By using Consensify error reduction, this effect is almost completely removed. See Barlow et al. 2018.
Effect of Consensify on admixture tests using D statistics. We took unmodified data from three brown bears for which the D statistic does not differ significantly from zero. We then modified one dataset in-silico to mimic the properties of ancient DNA using our program TAPAS. This resulted in false positives (significant non-zero D values) for two methods of calculating D statistics (abbababa 1 & 2), but for Consensify the values remained non-significant. See Barlow et al. 2018.

%d bloggers like this: