Genomic Data Reveal a Complex Making of Humans

new

Admin
Administrator

Posts: 72,751

Genomic Data Reveal a Complex Making of Humans Jul 13, 2021 22:34:53 GMT

Quote

Post by Admin on Jul 13, 2021 22:34:53 GMT

Results
We analyzed six sediment samples from different layers of areas A (pre-LGM) and B (pre and post-LGM) of Satsurblia cave (Figure1A) (Pinhasi et al., 2014), and performed sequencing to screen them for mammalian DNA (Table S1). One of the samples, SAT16 LS29 (SAT29), from layer BIII (Figure 1B), which is radiocarbon dated to 25.4-24.5 ka calBP (Pinhasi et al., 2014), contained substantial amounts of DNA from humans as well as from other mammals, and was therefore sequenced to greater depth.

Figure 1:
An overview of the location of Satsurblia cave with contextual information. A) Map of the Caucasus region showing relevant sites that have yielded ancient DNA from humans (blue dots), animals (red dot) or both (purple dot). Only sites with remains from the Mesolithic or older are shown. B) Layer B of Satsurblia cave with the SAT29 sampling location shown.

Metagenomic screening of SAT29 performed with Centrifuge (Kim et al., 2016) showed the presence of four mammalian genera among the eukaryotic reads: Ovis (28%), Homo (9%), Canis (5.5%) and Bos (2.1%) (Figure S1A). After competitive mapping to the sheep, human, dog and domestic cattle reference genomes (Method Details), we assigned a total of 4,956,676 reads as follows: Canis (2,378,237 reads, 48.0% of assigned reads), Bos (1,811,555 reads, 36.5%), Homo (661,765 reads, 13.5%) and Ovis (105,119 reads, 2.1%). Reads mapping to all four species have fragment lengths and deamination patterns typical of ancient DNA (Figure S1B-S1C).

Using the 661,765 human reads, we genotyped 11,116 pseudo-haploid positions from the 1240K dataset (Haak et al., 2015). To explore the human ancestry of SAT29 within the context of pre and post-LGM diversity we performed a Principal Components Analysis (PCA) on 2,335 modern Eurasian genomes (Table S2), and projected 78 ancient individuals onto the resulting components (Figure 2A, Table S2). Previous paleogenomic studies revealed two different ancient human lineages from the Caucasus that were distinct from the rest Pleistocene and early-Holocene diversity. A late Upper Palaeolithic (13.3 kya) genome from Satsurblia cave and a Mesolithic (9.7 kya) genome from the cave of Kotias Klde revealed what has been termed Caucasus Hunter Gatherer (CHG) ancestry, a distinct ancient clade that split from western hunter-gatherers ∼45 kya shortly after the expansion of anatomically modern humans into Western Eurasia (Jones et al., 2015). A second, older, pre-LGM lineage, represented by genome-wide data from two individuals dated to ∼26 kya from Dzudzuana cave, southern Caucasus, is associated with the peoples of the Near East and North Africa, and accounts for at least 46% of their ancestries (Lazaridis et al., 2018). We find that the SAT29 sample clusters with the Dzudzuana2 individual and not with the late Upper Paleolithic and Mesolithic genomes from the region, or with any of the other pre-LGM Eurasian genomes. Unsupervised ADMIXTURE clustering (Alexander and Lange, 2011) further supports a high similarity between SAT29 and Dzudzuana2 (Figure 2B, Figure S2).

Admin
Administrator

Posts: 72,751

Genomic Data Reveal a Complex Making of Humans Jul 14, 2021 5:37:27 GMT

Quote

Post by Admin on Jul 14, 2021 5:37:27 GMT

Figure 2:
SAT29 human genomics. A) A principal component analysis built with 2,335 modern Eurasian samples into which 78 ancient samples were projected. SAT29 appears closest to Dzudzuana2. B) In ADMIXTURE clustering with 78 ancient genomes, SAT29 displays a profile similar to that of Dzudzuana2. C) Maximum Parsimony tree built with 67 ancient and 168 modern Eurasian mitochondrial genomes. SAT29 falls close to the Upper Paleolithic samples from Bacho Kiro cave in Bulgaria.

We used outgroup f3-statistics to quantify the amount of shared genetic drift between SAT29 and other ancient genomes (Patterson et al., 2012). SAT29 shares more drift with Villabruna (Italy, 12140±70 bp) (Fu et al., 2016) and Dzudzuana2 than with other ancient individuals (Figure S3A), including the post-LGM individuals from the Caucasus (Satsurblia and Kotias). Among present-day Eurasian populations, SAT29 shows higher genetic affinity to Northern and Western Europeans rather than Central and Southern Asians (Figure S3B).

Through targeted capture (Maricic et al., 2010) we recovered 4,953 human mitochondrial DNA (mtDNA) reads, providing 15-fold coverage of the mtDNA. The consensus sequence has 16 derived positions (Table S3) compared to the rCRS sequence and belongs to haplogroup N like individual Dzudzuana3 from Dzudzuana cave (Figure 2C). We built a maximum parsimony (MP) tree with 68 ancient and 167 modern human mt genomes (Table S4, Figure 2C). The SAT29 sequence is positioned on a branch together with BK-CC7-355 (42450 ± 510 ka) and BK-BB7-240 (41850 ± 480 ka) from the Bacho-Kiro site in Bulgaria, the most ancient west-Eurasian mitochondrial sequences (Hublin et al., 2020) as well as Dzudzuana3 (Lazaridis et al., 2018).

We estimate 1% Neanderthal ancestry in the SAT29 sample, although with large uncertainty due to the low amount of data (95% confidence intervals: 0-6.6%). This point estimate is similar to that of Dzuzuana2 and likely lower than that of Palaeolithic Europeans due to dilution from Basal Eurasian ancestry (Lazaridis et al., 2018)

Our results from the human genomic data from the SAT29 sample are thus consistent with the study by Lazaridis et al (2018), revealing a a previously undocumented pre-LGM human ancestry from the Caucasus that contributed to various succeeding Eurasian populations. The low coverage of the SAT29 genome, however, did not allow us to detect any differences in ancestry patterns between Dzuzuana2 and SAT29.

We analyzed the 2,378,237 Canis reads using a set of variants among 722 modern wolves, dogs and other canid species (Plassais et al., 2019). We randomly sampled one SAT29 read at each position, resulting in a genotype call at 439,426 transversion variants. Using ADMIXTURE clustering (Figure 3A) and f4-statistics, we found that the SAT29 sample clearly shares genetic drift with wolves and dogs, to the exclusion of coyotes, golden jackals and other canids (Z > 30 for all species, Method Detail). It does not, however, display stronger affinity to wolves over dogs, or vice versa (Figure S4).

Admin
Administrator

Posts: 72,751

Genomic Data Reveal a Complex Making of Humans Jul 14, 2021 21:25:40 GMT

Quote

Post by Admin on Jul 14, 2021 21:25:40 GMT

Figure 3:
SAT29 wolf genomics. A) In ADMIXTURE clustering with wolves, dogs and other canid species, the SAT29 sample clusters with Eurasian wolves. B) Population history model relating the SAT29 sample to modern wolves, dogs and Pleistocene Siberian wolves. Inferred admixture proportions from the dog reference genome (REF), to account for reference bias, are shown in blue. The trifurcation point indicates that it cannot confidently be determined on which side of this point the SAT29 sample falls. C) Bayesian tree of 68 modern and 39 ancient wolf mitochondrial genomes.

To further investigate the relationship of the SAT29 canid to wolves and dogs, we incorporated two Pleistocene wolf genomes from Siberia (35-33 ka), which have ancestries that are basal to modern wolves and dogs (Sinding et al., 2020; Skoglund et al., 2015). We tested all possible topologies without admixture relating a coyote, SAT29, a modern wolf, a modern dog and the two Pleistocene Siberian wolves, while explicitly accounting for reference bias in the ancient genomes (Method Details). Three of the 100 graphs provide good fits, and feature the Siberian Pleistocene wolves on a branch basal to modern populations. The graphs fit equally well and differ only in that SAT29 is placed either basal to the Siberian Pleistocene branch, on this branch, or downstream of this branch (Figure 3B). Previous studies have found that present-day wolf population structure has mostly formed after the LGM (Freedman et al., 2014; Skoglund et al., 2015; Fan et al., 2016; Loog et al., 2020; Ramos-Madrigal et al., 2020). Our results are consistent with this scenario, as the SAT29 canid harboured an ancestry that diverged from the ancestors of modern wolves and dogs before these diversified. While Late Pleistocene wolves in the Caucasus were not closely related to those in Siberia, they thus had a similarly basal ancestry that has either gone extinct or been overwritten by later population processes.

Through targeted capture (Maricic et al., 2010) we recovered 9,884 Canis mtDNA reads, providing 13-fold coverage of the mtDNA. The SAT29 canid consensus sequence falls on a branch together with two ancient wolves from the Aghitu-3 cave in Armenia (Figure 3C), dated to 31-30 ka, on a seemingly extinct West Eurasian branch of the wolf mtDNA phylogeny (Loog et al., 2020).

We inferred the tip date of both the SAT29 human and wolf consensus mtDNA with various priors and datasets (Method Details). The human SAT29 mtDNA consensus has a mean age of 38,815 - 48,243 years bp (95% CI: 19,772 - 74,475 years bp) (Table S5) while the wolf mtDNA consensus mean age is 22,991 - 31,416 years bp (95% CI: 11,712 - 49,884 years bp) (Table S5). The direct date of the BIII layer of Satsurblia cave, from which SAT29 was taken, is 25.5.-24.2 ka cal. bp, and thus falls within the confidence intervals of the mtDNA tip dates. While the mean estimate for the human sequence is older than for the wolf, the 95% confidence intervals overlap across all combinations of datasets and priors used except one.

We compiled a number of Bovinae genomes (Verdugo et al., 2019; Wecek et al., 2016; Wu et al., 2018), identified 1.4 million heterozygous transversion sites in a Gaur genome, and assigned genotypes to all individuals at these sites by randomly sampling one read per position. This resulted in a genotype call at 27,724 transversion variants for the SAT29 bovid sample. Using ADMIXTURE clustering (Figure 4A) and f4-statistics, we find that the SAT29 bovid shares genetic drift with bisons (Bison sp.), to the exclusion of aurochs, domestic cattle and Asian bovid species (Z > 20 for all species, Method Details). This provides important authentication of the soil metagenomic approach, since the identified species is not the cattle (Bos taurus) reference genome. To explore the ancestry of the SAT29 bovine reads, we incorporated genomes from early 20th century European bison (Bison bonasus; wisents) from Poland and the Caucasus. SAT29 is closer to these, as well as to modern European bison, than to a North American bison (|Z| > 6), implying that the divergence between European and American populations predates the age of SAT29. The Caucasian and the Polish populations have been classified as separate subspecies of the European bison, but the SAT29 sample is not detectably closer to one over the other. Furthermore, these two recent populations share genetic drift to the exclusion of the SAT29 sample (|Z| = 3.4 and 4.5 for the two relevant f4-statistics).

Admin
Administrator

Posts: 72,751

Genomic Data Reveal a Complex Making of Humans Jul 15, 2021 5:53:31 GMT

Quote

Post by Admin on Jul 15, 2021 5:53:31 GMT

Figure 4:
SAT29 bison genomics. A) In ADMIXTURE clustering with cattle, aurochs, bisons, gayals and bantengs, the SAT29 sample clusters with bisons. B) A population history model relating the SAT29 sample to present-day American and historical (early 20th century) European bisons from Poland and the Caucasus.

We tested all possible topologies to summarize the relationships between these bison genomes. The best-fitting topology has SAT29 as basal to the historical genomes from Poland and the Caucasus, and the American bison as basal to all of these (Figure 4). This model explains all f4-statistics except the aforementioned signal of some excess affinity between the American and Polish genomes. We do not attempt to discriminate between more complex models arising if we allow additional admixture events.Overall, the results tentatively suggest that the history of bisons in western Eurasia might share some features with wolf history, in that late Pleistocene ancestries appear basal to present-day populations, suggesting that population structure has been substantially reshaped since the LGM.

We additionally recovered 72,100 reads aligning to Ovis aries and explored these reads with a dataset of 103 Ovis and Capra genomes from 10 species. Tests of the form f4(Oreamnos,SAT29;C,D) indicated that the SAT29 sample is more similar to Ovis than to Capra, but it does not cluster with any of present-day Ovis species (Figure S5). We were thus unable to elaborate on the ancestry of this Ovis DNA.

Finally, we explored whether the genomic data retrieved from SAT29 could reveal more about the identity of the individuals from which it derived. First, we examined the human and canid mtDNA to ascertain if multiple individuals contributed DNA. An absence of genetic variation among the human mitochondrial reads is consistent with these deriving from a single individual (Method Details), or from multiple individuals carrying the same mitochondrial haplotype. In contrast, we find evidence of variation among the wolf mitochondrial reads, suggesting they derive from multiple individuals (Method Details). Second, we examined the sex of the human, canid and bovid DNA. When comparing the amount of reads mapping to the X-chromosome relative to the autosomes, we find that the human data is consistent with deriving from a female individual. In contrast, the wolf and bison X-chromosome read fractions are intermediate between those expected for male and female karyotypes, suggesting it may derive from individuals of both sexes, again indicating multiple source individuals. Lastly we compared the number of human mitochondrial and nuclear reads in order to assess from which tissue the human DNA came. The loge(mt/nuc) coverage of the human genome has a value of 4.85, which according to Furtwängler et al. (Furtwängler et al., 2018) is a value typically associated with samples from petrous bones.

Admin
Administrator

Posts: 72,751

Genomic Data Reveal a Complex Making of Humans Jul 15, 2021 20:19:58 GMT

Quote

Post by Admin on Jul 15, 2021 20:19:58 GMT

Discussion
The recovery of genome-wide data from multiple animal species from a single Pleistocene soil sample opens a new avenue compared to what has previously been learned from sediment DNA (Ardelean et al., 2020; Slon et al., 2017; Willerslev et al., 2003; Zhang et al., 2020). Using this data we were able to make inferences about the genetic ancestry and history of three past mammalian populations that lived in the Southern Caucasus ∼25 ka. The damage characteristics of the recovered DNA, the mitochondrial tip dates and the fact that all three genomes represent ancestries that no longer exist among those species demonstrate that the DNA indeed is of Pleistocene origin, and not a product of modern contamination or movement between different soil layers.

The data presented in this study demonstrates that sediment DNA can be used not only to test the presence or absence of species, but also to obtain genome-wide data informative about the ancestry of the individuals from which the DNA derives. While the amount of DNA retrieved here is still much lower than what is possible from some high quality skeletal elements, it provides information comparable to low-coverage ancient genome sequences. The complementary analyses of multiple mammalian species shows the power of this approach to reconstruct numerous population histories. Our results thus expand the possibilities offered by sediment ancient DNA and demonstrate that it can sometimes serve as an alternative source of genome-wide information when skeletal material is not available, potentially even for organisms that do not leave such remains.

Human DNA screening
We first explored the six libraries for the presence of human DNA. We obtained an
average of 14,893,925 reads in each sequencing library. These reads were processed
with the methodology described in Collin et al. (Collin et al., 2020). This initial screening
showed that five samples exhibited only residual presence of human DNA. The sixth
sample, SAT29, had 0.03% of the total reads sequenced map to the human genome.
Therefore, this sample was selected for further sequencing. Sequencing and
classification results of these samples are presented in Table S1.

Bioinformatic processing of sample SAT29
After sequencing the SAT29 library to saturation and merging the sequenced reads
with the ones from the screening phase, we obtained a total of 561,263,536 reads.
These were clipped using Cutadapt 2.7 (Martin, 2011), removing the sequencing
adapters and the reads with poly-A tails (reads with more than four As) (Martin, 2011).
We also removed reads with qualities below 30 of bases in at least 75% of the read
bases with the FASTX-toolkit 0.0.1 (Hannon, 2010). Clipped reads were processed with
SGA (Simpson and Durbin, 2010) and redundant reads were removed disabling the
kmer check. Finally, two bases per end were trimmed and reads shorter than 30bp
were discarded with the FASTX-toolkit (Hannon, 2010). Once collapsed and filtered, we
ended with 226,880,778 reads that were used for further analyses. We used Centrifuge
1.0.3 (Kim et al., 2016) with default parameters to classify the sequenced reads into
taxa using the whole non-redundant nucleotide database from NCBI indexed following
the Centrifuge manual and plotted using Pavian (Breitwieser and Salzberg, 2020). The
classification showed the presence of four mammalian taxa with more than 2% of the
Eukaryotic classified reads, which we investigated in further analyses: Ovis aries, Bos
taurus, Homo sapiens and Canis lupus. To separate the sequencing reads of the four
major mammalian taxa we built a multi-fasta reference file with the genomes of: H.
sapiens (GRCh37 Assembly GCA_000001405.1), O. aries (Oar_v3.1, assembly,
GCA_000298735.1), B. taurus (ARS-UCD1.2 assembly, GCA_002263795.2) and C.
lupus (CanFam3.1 assembly, GCF_000002285.3) following a similar strategy described
in Feuerborn et al (Feuerborn et al., 2020). The filtered reads were aligned with bwa aln
(Li and Durbin, 2009) disabling seeding, and with a gap open penalty of two. Only
reads with mapping qualities above 30 were kept using Samtools 1.10 (Li et al., 2009).
Duplicated sequences were removed with picard 2.21.4 ([CSL STYLE ERROR:
reference with no printed form.]).. In total 4,956,676 reads were assigned to these
species (661,765 H. sapiens, 2,378,237 C. lupus, 1,811,555 B. taurus and 72,100 O. aries)
The characteristics and quality of the mapped reads was assessed with qualimap 2.2.1
(Okonechnikov et al., 2016). We determined the length distribution with fastqc
(Andrews, 2010) and assessed the level of damage with mapdamage 2.0.9 (Jónsson et
al., 2013). Although two bases per end were clipped the deamination values are
notably high: 3’ deamination of 23% in Bos, 28% in Canis, 20% in Homo and 19% in
Ovis and 5’ deamination of 25% in Bos, 30% in Canis, 21% in Homo and 20% in Ovis.
The distribution of these values along the sequence is presented in Figure S3.

Last Edit: Jul 15, 2021 20:24:53 GMT by Admin