The analysis of palaeogenomic data presents several challenges compared to data from modern samples. Palaeogenomic data is metagenomic, including sequences from the animal of interest plus all contaminating organisms which have colonised the bone postmortem. In addition, ancient DNA molecules are highly fragmented and acquire nucleotide misincorporations as they degrade over time.
At the Pleistocene Genomics Lab, an important part of our work is the development of new computational methods for the analysis of palaeogenomic data. For example, we have designed software for simulating ancient DNA data which allows the performance of mapping algorithms to be assessed (see Taron et al. 2018). We also developed a novel test for geneflow directionality for palaeogenomic data (see Barlow et al. 2018). You can find out more about all out software here.
Consensify error reduction
In a recent project, we developed a method for reducing error rates in palaeogenomic datasets. Palaeogenome datasets tend to be low coverage, and a standard practise across the field of ancient DNA has been to convert raw sequencing data into “pseudohaploid” sequences by sampling a single nucleotide from the read stack at each position of the the reference genome. This removes any bias resulting from differential coverage among datasets, but a consequence is that pseudohaploid sequences will generally have much higher rates of error compared to standard genotype or consensus calling methods. We developed Consensify (See Barlow et al. 2018), a new method for generating pseudohaploid sequences with reduced error rates. We find that using Consensify can improve the outcome of a wide variety of downstream analyses, including phylogenetic analysis, population clustering analysis and admixture testing. If you are interested in trying out Consensify, please visit our software page.