Genetic History of Tibetan Highlanders

new

Admin
Administrator

Posts: 81,837

Genetic History of Tibetan Highlanders Jan 5, 2022 1:33:49 GMT

Quote

Post by Admin on Jan 5, 2022 1:33:49 GMT

Figure 4
Multiple Denisovan Ancestries in Papuans

We further verified that the observed bimodal mismatch distribution in our high-confidence Denisovan blocks is not due to misclassification of Neanderthal blocks. The polymorphic sites of both peaks predominantly show the Denisovan-specific topology, with the Neanderthal-specific topology observed only at low levels (STAR Methods S10g). Further, if one of the peaks were caused by Neanderthal introgression misclassified as Denisovan, that peak should be seen in West Eurasians, who have substantial Neanderthal but no Denisovan admixture. However, West Eurasians have neither of the two Denisovan mismatch peaks. To additionally check whether some portion of the Neanderthal introgression signal could have been missed by only using the Altai Neanderthal reference, we repeated several analyses using CP and the Vindija Neanderthal (Prüfer et al., 2017). This approach yields highly consistent results with the original analysis (Figure S3; STAR Methods S10c). Finally, we identified Neanderthal-specific blocks in Papuans using the same methodology as for the high-confidence Denisovan blocks. These do not show a bimodal mismatch distribution to the Altai Neanderthal (Figure S3; STAR Methods S10c), suggesting that the history of Denisovan introgression in Papua differed markedly from modern human interactions with Neanderthals.

Figure S3
Mismatch Distributions Using Different Block Sets, Related to Figure 3 and STAR Methods

Deep Divergence between Denisovan Populations
Next, we sought to retrieve dates of divergence between D1, D2, and the Altai Denisovan genome through coalescent modeling (Tables S5A and S5B; STAR Methods S10i). After extending an archaic demographic model (Malaspinas et al., 2016) to encompass two deeply divergent Denisovan-related components, our best fitting model indicates that D1 and D2 split from the Altai Denisovan approximately 283 kya (9,750 generations, 95% confidence interval [CI] 261–297 kya) and 363 kya (12,500 generations, 95% CI 334–377 kya), respectively (Figure 4B). While clearly branching off the Denisovan line, D2 diverged so closely to the Neanderthal-Denisovan split that it is perhaps better considered as a third sister group (STAR Methods S10i). For context, even the youngest of these divergence times is similar to the evolutionary age of anatomically modern humans (earliest known fossils, with varied morphologies, date to 198 kya (McDougall et al., 2005) and 315 kya (Hublin et al., 2017)). Our model implies substantial reproductive separation of multiple Denisovan-like populations over a period of hundreds of thousands of years.

Admin
Administrator

Posts: 81,837

Genetic History of Tibetan Highlanders Jan 5, 2022 2:22:51 GMT

Quote

Post by Admin on Jan 5, 2022 2:22:51 GMT

The Two Denisovan Lineages Introgressed at Different Times
The distribution of block lengths retains a signal of introgression time, with longer blocks expected from more recent introgression events. In general, block length is expected to decay over time approximately as an exponential distribution (Gravel, 2012). We confirmed the accuracy of introgression dating by exponential fitting of the block length distribution through extensive simulation, incorporating different introgression times over the time period of interest (0–2,000 generations), and considering the impact of using only long blocks rather than the entire distribution of block lengths, substantial block length estimation errors, and the consequences if introgression occurred as an extended process rather than a single pulse (Figure S4; STAR Methods S10h). We observed a slight tendency to infer overly recent dates under some of these conditions, but never by more than 10%–15%. Filtering to longer block lengths and fitting an exponential with a larger location parameter help to reduce even these biases in date estimates.

Figure S4
Using Simulations to Assess the Accuracy of Introgression Time Inference Based on Exponential Fitting of Block Length Distributions, and Example Fits, Related to Figures 4 and 5 and STAR Methods

While the median block lengths of D1 and D2 are similar in Papuans (238 and 236 kb), their distributions are significantly different (Kolmogorov-Smirnov statistic = 0.15, p = 2.2 × 10−6). Exponential fitting of D1 and D2 haplotype lengths yields introgression dates of 29.8 kya (95% CI 14.4–50.4) and 45.7 kya (95% CI 31.9–60.7), respectively, which are younger, though overlapping with, previously suggested estimates for Denisovan introgression (Figure 4B; STAR Methods S10h) (Malaspinas et al., 2016). The maximum likelihood introgression date for D2 introgression is 50% more ancient than the date for D1. Based on simulations, and given the greater statistical challenge of identifying shorter introgression blocks, we consider these dates to be probable lower bounds on introgression times, but with true dates no more than 15% more ancient.

Admin
Administrator

Posts: 81,837

Genetic History of Tibetan Highlanders Jan 5, 2022 3:56:48 GMT

Quote

Post by Admin on Jan 5, 2022 3:56:48 GMT

Geographical Patterns of Denisovan Admixture in Papua
D1 and D2 introgression times that overlap the timescale of modern human arrival and their variable dispersal across Papua raise the possibility that Denisovan introgression occurred after local populations of modern humans had differentiated. We find geographic structure associated with the D1 variation between mainland New Guinea and the Baining, a population on the offshore island of New Britain. We observe slightly less high-confidence Denisovan introgression in the Baining than in mainland Papuans (31.5 Mb versus 33.1 Mb per haploid genome, Welch’s t test T = −3.4, p = 0.001), despite extremely similar population histories (Hudjashov et al., 2017), including similar levels of Asian ancestry (Figure 1A). However, there is less D1 sequence in the Baining than in mainland Papuans (1.33 Mb versus 1.82 Mb per haploid genome, Welch’s t test T = −3.9, p < 0.01), although both carry similar levels of D2 sequence (1.28 Mb versus 1.37 Mb, T = −0.8, p = 0.41) (Figures 5A and 5B ; STAR Methods S10h).

Figure 5
Geographic Patterns of D1 and D2 Ancestry

To determine whether this difference in D1 sequence could be due to random drift in the two populations or to different Denisovan introgression histories, we extended the simulation model (Malaspinas et al., 2016) to incorporate population structure representing both New Guinea mainlanders and Baining, in addition to the two introgressing Denisovan populations (D1 and D2) (Figure S5; STAR Methods S10j). To test a conservative model offering maximum opportunity for isolation and drift, we did not include any migration between Papuans and Baining after their population split. Archeological evidence suggests that New Britain was settled by at least 35 kya (Pavlides and Gosden, 1994), and from the genomic data, SMC++ (Terhorst et al., 2017) infers a genetic split time between mainland Papuans and Baining of 15.7 kya (Figure 5C). We therefore implemented three alternative demographic models: using the SMC++ genetic split times and population sizes (M1); using the SMC++ split time and more conservative (smaller) population sizes, thus generating more drift (M2); and a model with a more conservative (older) genetic split time of 23.2 ky (800 generations), also generating more drift (M3) (Figure S6; STAR Methods S10j). As expected, the observed difference in rates of D2 introgression between Baining and mainland Papuans are within the distributions predicted by the simulations. However, in all three cases, the observed ratio of D1 in mainland Papuans to Baining lies outside simulated values (Figure 5D).

Figure S5
Simulation Model Schematic and Mismatch Results, Related to Figure 4 and STAR Methods

Figure S6
Using Heterozygosity and Fst to Inform Models of Mainland Papuan/Baining Drift, Related to Figure 5 and STAR Methods

Together, these coalescent simulations suggest that the reduced frequency of D1 blocks among the Baining is unlikely to result from shared D1 introgression into a common ancestral Papuan population, followed by drift as each population subsequently diverged into the modern Baining and mainland groups. Instead, the difference in D1 levels more likely reflects different amounts of introgression from Denisovan populations into mainland New Guinea and the islands to the northeast, which occurred after the separation of the two Papuan populations (Figure 5E). The overall genetic similarity and relatively recent divergence of these Papuan groups (Figures 1 and 5C; STAR Methods S10h, S10j) have implications for the past distribution of D1 Denisovan populations and the process of archaic introgression.
First, our data suggest that the D1 Denisovans, in contrast to D2, contributed additional DNA to the mainland New Guinea population after the mainland and Baining populations diverged from their common Papuan ancestor (Figure 5E). This, together with the nearly complete absence of D1 in continental Asia, is most consistent with the scenario that D1 Denisovans were present in New Guinea or East ISEA (e.g., Wallacea). In turn, this would imply that at least some Denisovan populations had the ability to cross large bodies of water, such as the one represented by the Wallace Line. This idea does not seem implausible given archaeological evidence of archaic hominin dispersals—notably, the discovery of stone tools in the Philippines dating to 700 kya (Ingicco et al., 2018) and the related finding of H. floresiensis on the island of Flores (Brown et al., 2004), both across substantial water boundaries that persisted throughout the Pleistocene. Such geographical barriers would limit gene flow and might help to explain the extent of divergence between the D1 Denisovan population and other Denisovan groups.
Second, the late date for the D1 introgression and geographic structure in modern populations suggests that Denisovans survived until 30 kya, and perhaps as recently as 14.5 kya. This is longer than Neanderthals, who died out around 40 kya (Higham et al., 2014), or H. floresiensis, which recent dating suggests did not persist on Flores beyond 50–60 kya (Sutikna et al., 2016). The implication is that Denisovans living in ISEA may have been among the last of all the archaic hominins to survive. This provides an argument to screen for Denisovan remains possibly misclassified as other hominins in existing archaeological collections and encourages more archaeological research in the poorly accessible and hence incredibly understudied New Guinea region.
Third, the combined evidence of geographic structure and a recent D1 introgression date suggest that Denisovan introgression did not occur immediately following the first modern human settlement in the region (45–50 kya) (O’Connell et al., 2018). This implies that introgression with archaic hominins may not be an inevitable and immediate result of joint occupation of the same territory.

Admin
Administrator

Posts: 81,837

Genetic History of Tibetan Highlanders Jan 5, 2022 19:19:52 GMT

Quote

Post by Admin on Jan 5, 2022 19:19:52 GMT

Limited Evidence of Further Introgression Complexity in East ISEA and Papua
Given the recent presence of Homo floresiensis in our study area (Brown et al., 2004, Sutikna et al., 2016), and the possibility that late Homo erectus was contemporary with the earliest anatomically modern humans in ISEA (Yokoyama et al., 2008), we investigated whether there might be any hints of archaic hominin ancestry, other than Denisovan or Neanderthal, in the dataset. We attempted to detect such signals by analyzing S∗ windows that exhibit minimal overlap with Denisovan or Neanderthal blocks as identified by CP and HMM (residual S∗, STAR Methods S12).
We first note a pronounced excess in total S∗ signal in our Papuan samples (97.2 Mb) compared to East Asians (50.9 Mb), South Asians (48.3 Mb), and West Eurasians (40.8 Mb). After confirming that this excess was primarily driven by introgressing Denisovan ancestry, we estimate that any additional introgression from outside the Human-Neanderthal-Denisovan clade was limited with an upper bound of about 1% (STAR Methods S12a). Next, by profiling residual S∗ among different continental groups, we detect a slight excess of unique variation that is not shared with other humans, the Altai Denisovan or the Altai Neanderthal in East ISEA and Papua (Figure S7; STAR Methods S12b). The signal is not strong, and the difference in total residual S∗ between different global populations is small, suggesting at most little introgression from outside the Human-Neanderthal-Denisovan lineage in these two populations. This could hint at a more complex introgression history involving unknown archaic hominins in ISEA and Papua, such as H. erectus, as has been recently suggested for other Asian populations (Mondal et al., 2016). For instance, the Altai Denisovan is also thought to have some H. erectus ancestry (Lipson and Reich, 2017, Mallick et al., 2016, McColl et al., 2018, Prüfer et al., 2017, Skoglund et al., 2016), although it is not yet clear whether this is also true for introgressing Denisovan populations. Equally, however, these genomic signals could arise without further introgression events, notably through balancing selection or incomplete lineage sorting, and so warrant careful further study.

Figure S7Regional Distribution of S∗ and Residual S∗, and Mutation Motif Characteristics of Residual S∗ Windows, Related to Figure 7 and STAR Methods

Finally, our dataset includes Rampasasa, a village on Flores that is close to the cave site where the H. floresiensis bones were found (Sutikna et al., 2016), and whose inhabitants were the subject of a recent genetic study (Tucci et al., 2018) The proportion of Neanderthal and Denisovan introgression, and the amount of residual S∗ in this village is comparable to neighboring populations (Figure 7; STAR Methods S13), suggesting the absence of unusual archaic admixture in Rampasasa villagers relative to other people in East ISEA.

Admin
Administrator

Posts: 81,837

Genetic History of Tibetan Highlanders Jan 5, 2022 19:50:09 GMT

Quote

Post by Admin on Jan 5, 2022 19:50:09 GMT

Figure 7
Correlations of Papuan Ancestry with Archaic and S∗ Components

Conclusions
The discovery and characterization of archaic hominins has typically begun with the analysis of fossil remains (Meyer et al., 2012, Prüfer et al., 2014, Prüfer et al., 2017, Slon et al., 2018). However, as Denisovan admixture has its center of gravity in ISEA and Papua where DNA rarely survives more than a few thousand years in the humid tropical environment (Lipson et al., 2018, McColl et al., 2018), studying the genetic record from modern humans remains the sole way to shed light on the substructure and phylogeography of archaic hominins in this important but understudied region.
Here, we use a statistical approach on new genomes from ISEA and Papua to identify two new Denisovan groups (D1 and D2) and describe the relationships between these archaic hominins long before they first interacted with anatomically modern humans. Both groups branched off early from the Altai Denisovan clade at 283 and 363 kya and were reproductively isolated from the individuals at Denisova cave in Siberia and from each other. Yet both groups bred with modern humans, contributing around 4% of the genomes of Papuans, including over 400 gene variants enriched for traits involving immunity and diet. Some of this introgression is restricted to modern New Guinea and its surrounding islands and may have occurred as late as the very end of the Pleistocene, making the admixing D1 Denisovan population among the last surviving archaic hominins in the world.
The genetic diversity within the Denisovan clade is consistent with their deep divergence and separation into at least three geographically disparate branches, with one contributing an introgression signal in Oceania and to a lesser extent across Asia (D2), another apparently restricted to New Guinea and nearby islands (D1), and a third in East Asia and Siberia (D0). This suggests that Denisovans were capable of crossing major geographical barriers, including the persistent sea lanes that separated Asia from Wallacea and New Guinea. They therefore spanned an incredible diversity of environments, from temperate continental steppes to tropical equatorial islands. The emerging picture suggests that far from moving into sparsely inhabited country, modern humans experienced repeated and persistent interactions as they expanded out of Africa into this highly structured archaic landscape across Eurasia. This genetic contact yielded a rich legacy, including hundreds of gene variants that continue to contribute to the adaptive success of anatomically modern humans today.

STAR★Methods

Method Details
A schematic overview of the analytical pipeline presented here is shown in Figure S1A (STAR Methods S1–5) and Figure S1B (STAR Methods S6–13). Datasets used are shaded in green; analyses and inferences in yellow; and key steps are outlined in bold.
S1 - Sequencing and SNP calling
Sequencing libraries were prepared using TruSeq DNA PCR-Free and TruSeq Nano DNA HT kits depending on DNA quantity. 150 bp paired-end sequencing was performed on the Illumina HiSeq X sequencer.
Individuals were sequenced to expected mean depth of 30x, with an achieved median depth of raw reads across samples of 43x.
These newly generated whole genome sequences were combined with the following published genomes (raw reads):
a)
292 genomes from the Simons Genome Diversity Project (SGDP) (Mallick et al., 2016)
b)
25 Papuan genomes from the Malaspinas et al., 2016 study
SNP calling was performed on the combined dataset, with published genomes analyzed from raw reads exactly as for the new sequence data.
Trimmomatic v. 0.38 (Bolger et al., 2014) was used to cut adapters and low-quality sequences from the reads. After trimming, the vast majority of reads were longer than 145 bp; those below 60 bp were excluded. We aligned the reads to the ‘decoy’ version of the GRCh37 human reference sequence (hs37d5) using BWA MEM (Li, 2013). We removed duplicate reads with picard-tools v. 2.12.0 (http://broadinstitute.github.io/picard) and performed local realignment around indels with GATK v. 3.5 (Poplin et al., 2017).
After alignment, and keeping only properly paired reads that mapped to the same chromosome, the sequencing depth across the samples used in downstream analyses was as follows: min = 18x, Q1 = 35x, median = 38x, Q3 = 43x, max = 48x. Only three samples had median coverage rates below 30x: CBL34, RAM005 and RAM067.
Base calling was undertaken with GATK v. 3.5 following GATK best practices. Per-sample gVCF files were generated using GATK HaplotypeCaller (using only reads with mapping quality ≥ 20). Single sample gVCFs were combined into multisample files using CombineGVCFs, and joint genotyping was performed using GATK GenotypeGVCFs, outputting all sites to a multisample VCF. Exactly the same base calling steps were applied to new and published samples, and the joint genotyping included all samples in this study.
Using BCFtools v. 1.4 (Li, 2011), the following filters were applied to each genotype call: base depth (DP) ≥ 8x and ≤ 400x, and genotype quality (GQ) ≥ 30. We then kept only biallelic SNPs and invariable reference sites. For the majority of our analyses, we kept only sites that had high quality variant calls in at least 99% of samples. (Specifically, all analyses in STAR Methods S5-S9 and S11, and all analyses in S10, apart from two result robustness checks that assessed phasing and archaic haplotype topologies. Additionally, we did not apply the call rate filter in the motif-counting analysis in STAR Methods S12). Applying this 99% call-rate filter yielded a total of 36,462,963 SNPs in the combined dataset. We removed sites within segmental duplications, repeats and low complexity regions. These masks were downloaded from the UCSC and Broad Institute genome resources:
hgdownload.soe.ucsc.edu/goldenPath/hg19/database/genomicSuperDups.txt.gz
software.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html
In the filtered and masked VCF files, we examined several statistics across the samples: the percentages of no-calls and singletons; the average depth; transition/transversion ratio; the number of variants; and heterozygosity. One highly heterozygous sample from the SGDP (LP6005441-DNA_A09, Naxi-2) was excluded based on these metrics, as well as on the basis that the original authors determined that this sample had been contaminated (Mallick et al., 2016).
S2 - Kinship and outlier analysis
We performed sample kinship analysis using KING v. 2.1 (Manichaikul et al., 2010). Of the 161 new genomes, 6 were excluded due to the presence of a first-degree relative in the dataset, leaving a total of 155 genomes for downstream analysis. This relatedness is a consequence of the village-scale sampling strategy employed in this study. In addition, 7 sample pairs display second-degree relatedness (BNA05 / BNA12-F, BNA21-F / BNA26-F, CBL018 / CBL019, RAM045-F / RAM067, RAM022 / RAM039-F, NIAS08 / NIAS12, NIAS01 / NIAS10). These samples were kept for further analyses, and the final dataset comprised 471 genomes: 155 newly generated complete genomes, 25 genomes from the Malaspinas et al., 2016 study and 291 genomes from SGDP.
Principal component analysis (PC) was used to detect sample outliers and characterize regional diversity. First, we applied LD pruning of our SNP set using PLINK v. 1.9 (Chang et al., 2015). Pruning was performed in 1 Kb sliding windows with a step size of 100 bp, and SNPs with R2 > 0.1 were removed. Next, PCA was performed in EIGENSOFT v. 7.2.0 (Patterson et al., 2006, Price et al., 2006) without the outlier removal step. The results of a PCA without African samples (N = 429) is shown in Main Text Figure 1B.

Last Edit: Jan 5, 2022 19:50:51 GMT by Admin