Genetic History of Tibetan Highlanders

new

Admin
Administrator

Posts: 73,125

Genetic History of Tibetan Highlanders Jan 6, 2022 4:43:42 GMT

Quote

Post by Admin on Jan 6, 2022 4:43:42 GMT

f. D1/D2 signal is not caused by selection
We second considered the possibility that the two peaks represented differential selection. Under this explanation, we might expect D1 blocks to be under strong purifying selection, reducing variation and mismatch, with D2 blocks evolving under neutrality.
Purifying selection is expected to be focused on genic regions. We therefore assessed whether D1 blocks have more overlap with genes than D2 blocks. As when assessing recombination rate differences, we compressed the sets of blocks in the two components to their respective overlapping subsets using the bedtools function ‘merge’, leaving 86 D1 regions and 68 D2 regions. 79 and 60 regions overlapped genes (Ensemble 91 GRCh37) respectively (non-significant χ2, p = 0.63).
As an additional test for differing levels of purifying selection, we asked whether D1 and D2 genomic regions differed in their B values. B values are measures of background selection over the genome based on observed diversity in an alignment of human, chimp, gorilla, orangutan and macaque (McVicker et al., 2009). After converting original B values to GRCh37/hg19 genome coordinates, we calculated the average B value over each D1 and D2 region. The distributions of average B values in D1 and D2 regions were not significantly different (standardized 2-sample Anderson-Darling test statistic −0.87, asymptotic p = 0.91). The total average and standard deviation for all D1 and D2 regions were 0.48 ± 0.24 and 0.50 ± 0.23, respectively, and hence statistically overlapping.
As D1 bocks have lower mismatch estimates compared to D2, they could have been under strong purifying selection since the time of introgression and might show a more pronounced skew toward rare blocks. We therefore performed a Mann-Whitney U test to assess whether the frequency distribution of blocks in D1 and D2 are different in our Papuan samples. There was a significant statistical difference in the frequency distributions (U = 3304, two-tailed p = 0.031), but summary statistics describing the distributions were similar (mean D1: 0.048, D2: 0.049; median D1: 0.028, D2: 0.021; standard deviation D1: 0.062, D2: 0.069). Importantly, the proportion of rare blocks with frequency < 5% was in fact lower in D1 (70%) than D2 (76%), which is consistent with neutrality.
Based on the lack of a clear genic/non-genic division between D1 and D2 blocks, their similar B values, and no pronounced frequency skew toward rarer D1 blocks, we do not interpret the mismatch difference between D1 and D2 as being driven by selection.
g. The topology of D1/D2 blocks
The network of interacting hominin populations in the Middle Pleistocene is becoming increasingly complex. One phenomenon that is included in some models (Lipson and Reich, 2017, Mallick et al., 2016, McColl et al., 2018, Prüfer et al., 2014, Skoglund et al., 2016), but not others (such as the main model in the Malaspinas et al., 2016 study) is a usually small Homo erectus component in the Altai Denisovan. Our approach does not allow us to identify genomic regions derived from H. erectus that may have introgressed into modern humans only, but not into the Altai Denisovan (but see STAR Methods S12). However, if the genetic contact between H. erectus and Denisovans occurred before the divergence of the Altai Denisovan and the Denisovan population that introgressed into humans, our D1 and/or D2 blocks could include regions with H. erectus ancestry, which were introduced into modern humans by a Denisovan population that was already pre-admixed with H. erectus. If so, there would be two categories of blocks identified as introgressed in the modern human – those derived from the Denisovan clade and those with an H. erectus origin – which could have different mismatch distributions and create a bimodal signal. While this phenomenon is likely to be rare if the proportion of H. erectus in Altai Denisovan is small (2–14%), some models incorporate a surprisingly large contribution (e.g., 66% in Figure 3 of Skoglund et al., 2016).
We therefore attempted to assign D1 and D2 blocks to specific coalescent topologies by counting mutation sharing patterns and assessing consistency with the fifteen possible topologies implied by the four leaves of the tree (the block, Altai Denisovan, Altai Neanderthal, and human). For this analysis, we followed the phasing assessment (see STAR Methods S10e above) in using the Russian LP6005441-DNA_G10 to represent humans and a minimally masked dataset. We counted the number of mutations in the 16 possible sharing categories. For example, with notation 0 indicating the ancestral allele and 1 indicating the derived allele and in order [Human, Neanderthal, Denisovan, Block], a derived mutation that is unique to the block would contribute to mutation motif 0001, a derived mutation shared between the block and the Altai Denisovan would contribute to motif 0011, and a fixed derived mutation would contribute to mutation motif 1111. When counting, we downweighed the contribution of variants in unphased regions such that all possible mutation motifs represented were counted, in proportion to their probability given an equal chance of each variant being on each haplotype. The approach we take – studying the frequency of ancestral and derived variants in a set of samples of interest, and assessing the consistency of these with different phylogenetic tree topologies – is allied to that taken in recent work investigating the different demographic models of modern human history (Wall, 2017).
We note that relating topologies to demographic histories is complex; for example, given a sufficiently large ancestral population size, each coalescent topology would have an approximately equal probability even if all blocks are Denisovan-introgressed. Nevertheless, studying the difference in topology proportions in D1 and D2 to clarify the history of these block sets can be useful.
To summarize, the topology proportions in D1 and D2 blocks do not support the idea that D1/D2 mismatch differences are driven by H. erectus introgression into the Altai Denisovan, into D1 or D2 blocks independently, or both. The prevalence of the (H,(N,(D,X))) topology, and the topology differences between D1 and D2, are consistent with Denisovan-like introgression in Papuans originating from two populations on the Denisovan clade.
The analysis above is useful in teasing apart the proportions and qualities of coalescent topologies represented in D1 and D2, and can help to rule out specific causes of their mismatch distributions. However, especially given the complexity of block identification, we emphasize that the topologies show consistency with a demographic scenario of interest rather than discounting all other scenarios. The topology differences between D1 and D2 could be consistent with other scenarios, including introgression from a Neanderthal/Denisovan sister-clade or extremely complex bottlenecks on the Denisovan clade.
Having established the likely cause of D1 and D2 mismatches as complexity in archaic hominin introgression, we sought to further explore differences between the two block sets. We proceed by assessing evidence for different introgression dates based on block lengths, and by asking whether a model with introgression from two deeply divergent Denisovan-clade populations is supported by simulations.
h. D1/D2 Denisovan lineages introgressed at different dates
Given the likely demographic origin of the D1 and D2 haplotypes, the question of whether D1 and D2 have different introgression dates is of particular interest. We therefore sought to estimate introgression dates based on haplotype lengths, which are expected to follow an exponential distribution with its decay parameter depending on the introgression date and the amount of introgression (Equation 1 in Gravel, 2012). The accuracy of this approach depends on the power of our methods to detect haplotypes of different lengths. Two factors are relevant. First, shorter haplotypes are expected to be harder to detect as signals of introgression may be mistaken for noise. Second, haplotypes are expected to be broken up by phasing errors; while the incidence of switch point errors is low in our data (see STAR Methods S4), and while our use of archaic genomes in phasing is expected to substantially reduce errors in introgressed regions, some errors are probable. The haplotypes that we assign to D1 and D2 are extremely long (> 180 Kb), such that the signal of introgression is clear, and the power of our methods is expected to be consistently high. Nevertheless, we caution that date estimates derived from this method may be better considered as relative rather than absolute dates.
We sought to use simulations to profile the potential of date estimation through the distribution of block lengths using a set of long introgressing blocks, and to confirm that fitting dates using longer blocks only does not lead to substantial biases. We first note that deviations from the exponential distribution are known to occur under certain combinations of introgression parameters (especially smaller population sizes and larger admixture proportions; see Figure 3 in Liang and Nielsen, 2014). We therefore assessed the correspondence between simulations and the exponential expectation for our parameter range of interest.
The results of 200 replicates of a Wright-Fisher forward-time simulation of 5 Mb chunks, with recombination rate 1.27 × 10−8 (average rate from the HapMap combined genetic map) and the chromosome discretized into 1 Kb segments for computational simplicity, using an introgression proportion of 0.04, haploid Ne of 8334 (Australian Ne in Malaspinas et al., 2016), and introgression times from 50 to 2950 generations in steps of 50 generations, show that the exponential fit is close but not exact (Figures S4A and S4B), even when all individuals in the population are sampled. We attempted to fit each simulation both by maximum likelihood (using the scipy.stats.expon.fit function), using either all blocks, those with minimum length 50 Kb, or those with minimum length 180 Kb. We were able to retrieve accurate introgression dates (Figure S4C), although there is a tendency to underestimate the introgression date for more ancient introgression times. In the regime of interest (introgression times < 1800 generations), the deviation is at most 10%–15%. These fittings confirm that it is possible to fit dates using longer block lengths only.
Additional challenges in inferring introgression dates arise from errors in blocks length estimation. We profiled these using the forward-time simulations and fittings as above, but now modified the introgressing block lengths to Berror = B + Laplace(μ, b) on sampling, where B is the error-free simulated block length and the Laplace distribution location parameter μ = 0, while scale parameter b = 20000. Choosing the Laplace distribution as our error model assumes equal probability to over and under-estimate block length with a constant probability of error per base pair. If Berror < 0, the block was discarded, capturing the difficulty in correctly identifying short blocks. The Laplace(0,20000) distribution has a cumulative density function at 2.5% and 97.5% of −59915 and 59915 bp, respectively, capturing very substantial errors in block sizes. Under these models, we are still able to achieve accurate fitting of dates (Figures S4B and S4D), although bigger biases arise when including smaller blocks in the fitting. Again, there is a slight tendency to underestimate introgression dates by 10%–15% when using longer blocks.
Introgression may well have occurred over many generations rather than as a single event, and we considered it probable that fitting using block lengths would emphasize the most recent introgression time. For example, if very weak introgression were (hypothetically) to occur up to the present and even a small number of very long introgressing blocks were sampled, this could lead to a very recent inferred introgression date, likely depending on the fitting procedure. We therefore repeated the forward-time simulations, this time simulating introgression over 520 generations (15080 years with a generation time of 29 years) at a rate 0.04/520 = 7.69 × 10−5. Note that the effective introgression rate is only marginally reduced compared to the single-event introgression model due to replacement of haplotypes that are already partially introgressed under this model, and the expected correction when there is relatively low total introgression (as in our case) is minimal. Measuring the simulated introgression date as the mid-point in the introgression process (e.g., weak introgression from 1090 to 1610 generations ago corresponds to a mid-point of 1350 generations), there is again a slight bias to infer more recent introgression dates (Figure S4E). For the introgression times of interest, this bias is again no more than 10%–15%.
Finally, we fitted the output of coalescent simulations generated in STAR Methods S10i, which builds on the Malaspinas et al., 2016 model with Denisovan introgression at 1353 generations ago. A slight bias toward inferring more recent introgression dates was observed, of approximately 5%. Adding errors to the sampled block lengths as above did not change the inference when using long blocks > 180 Kb.
The simulations above suggest that introgression date fitting based on block lengths is effective given our demographic parameters and our use of larger block lengths, even under a strong block length estimation error model or introgression occurring over many generations rather than as a single event. Nevertheless, in each case there is a slight bias toward recent dates, that is greatest when introgression is more ancient. The downward bias in date estimation is limited to 10%–15% for the times most likely corresponding to archaic introgression in humans, and is likely closer to 5%. Based on these simulations, we consider the dates we report to be probable lower bounds on introgression dates, with true dates up to 15% more ancient than our fitting suggest.
We proceeded to perform exponential fittings on the lengths of blocks in the D1 and D2 sets (Figures S4F and S4G) using the Python package statsmodels v.0.8.0 (Seabold and Perktold, 2010) and maximum likelihood fitting, assuming either a constant recombination rate or the combined HapMap genetic map (Frazer et al., 2007). We confirmed 95% CIs through a block-bootstrap procedure, whereby the genome was divided into 2 Mb chunks, consecutive chunks were combined if blocks spanned boundaries, and artificial samples were generated by sampling the same chunks over all individuals with probability proportional to chunk length (usually 2 Mb, but sometimes 4 Mb or more) until the total observed number of blocks corresponded to that expected from the data. When calculating the date of introgression, we assumed an introgression proportion of 2% for each of D1 and D2, such that half of a proposed total of 4% Denisovan introgression (Malaspinas et al., 2016) entered from each ancestry into Papuans, and a generation time of 29 years.
The results of the block length fittings are consistent with relatively recent introgression times. Under a constant recombination rate, we estimate dates of D1 introgression as 17.9 kya (95%CI 8.7–29.4) and D2 as 32.9 kya (95%CI 22.9–44.2). While there is no Papuan recombination map, we also sought to incorporate local recombination rates into the fitting by scaling block lengths by the average combined HapMap genetic map recombination rate over all blocks in the D1 or D2 sets, respectively. This maintains the approximately exponential distribution of block lengths. Under this model, we retrieved date estimates of D1 introgression at 29.8 (95%CI 14.4–50.4) and D2 at 45.7 (95%CI 31.9–60.7). The weight of probability supports younger dates within this range (Main Text Figures 4B and 5E). On average, D2 introgression is relatively older than D1 introgression, by 1.84 and 1.53 times for the two recombination models above, respectively.
To assess the robustness of this finding, we repeated the fitting after removing replicates of haplotypes observed in multiple individuals. In this way, we are seeking to observe recombination histories that are independent, and to limit the impact of haplotypes at higher frequency due to selection. Under a constant recombination rate and using unique haplotypes (Figures S4F and S4G), we estimate dates of D1 introgression as 20.2 kya (95%CI 10.4–33.5) and D2 as 29.5 kya (95%CI 23.2–36.1); under the HapMap scaled recombination rate, we estimate dates of D1 introgression as 33.7 kya (95%CI 16.8–57.7) and D2 as 44.3 kya (95%CI 34.4–55.4). The D2 average introgression date is now estimated as 1.46 or 1.31 times as ancient as D1, depending on the recombination model. The various block length fittings all suggest that D1 introgression was relatively more recent than D2 introgression.
Several other lines of evidence are consistent with D1 having a more recent introgression date. First, the variance in the frequencies of non-overlapping D1 blocks is less than that observed for D2 blocks (Fligner’s test for equal variance = 7.1, p = 0.008). After a pulse of introgression, we expect the variance in haplotype frequencies to increase as haplotypes drift away from their initial frequency. Second, there is structure in the geographic distribution of D1 and D2 introgression (Main Text Figures 5A and 5B). The amount of identified sequence in the D1 block set (including blocks with mD < 0.1) is significantly lower in Baining samples as compared to mainland Papuans (1.33 Mb per phased haploid genome versus 1.82 Mb on Papua; Welch’s t test T = –3.9, p = 4 × 10−4), while there is no evidence of a different amount of D2 (including blocks with mD > 0.23) sequence (1.28 Mb versus 1.37 Mb, T = –0.8, p = 0.41). To account for sampling differences between New Guinea (N = 52) and Baining (N = 16) and to estimate CIs around the observed mean, we resampled one million times (with replacement) 16 samples from each of two islands. While the average amounts of D2 per individual in both populations overlap (NG: 2.75, 95%CI 2.36–3.12; Baining 2.37, 95%CI 1.96–2.79), Baining has significantly fewer D1 chunks (NG: 3.64, 95%CI 3.10–4.19; Baining 2.60, 95%CI 2.19–2.99). To additionally assess whether the ratio of D1 between mainland Papuans and Baining is greater than the ratio of D2 between mainland Papuans and Baining, we resampled sets of mainland Papuan (N = 52) and Baining (N = 16) individuals with replacement one million times, recording whether the median pairwise D1 ratio was greater than the median pairwise D2 ratio. D1[Papuaresample]/D1[Bainingresample] was greater than D2[Papuaresample]/D2[Bainingresample] in 95.7% of resampling iterations.
Together, these geographical patterns raise the intriguing possibility that the D1 component is more typical of mainland Papua, and introgression may even have been ongoing in the time-frame of the split between Baining and Papuan populations. Similarly, there is a tendency for populations outside ISEA to show a Denisovan signal more consistent with D2 (Main Text Figure 3C), although the limited amount of Denisovan introgression means that the blocks contributing to the mismatches are shorter, and hence there is less resolution in the mismatch distributions. While it is possible that the reduced D1 signal in Baining samples is caused by weak Asian admixture (given the lack of evidence for the D1 signal in mainland Asia), this would additionally be expected to reduce the Baining D2 signal compared to mainland Papuans (not observed, see above) and generate an excess signal of Asian ancestry in the Baining as compared to mainland Papuans (not apparent in LOTER results, Main Text Figure 1A and Table S2; or in the PCA, Main Text Figure 1B, where the Baining cluster toward Australians rather than with the Asian-admixed Bougainville samples). We additionally note the detailed demographic analyses in Hudjashov et al., 2017, which strongly place the Baining as a recently separated Papuan population that does not harbor additional admixture signals from a wide range of other regional populations.

Admin
Administrator

Posts: 73,125

Genetic History of Tibetan Highlanders Jan 6, 2022 21:13:28 GMT

Quote

Post by Admin on Jan 6, 2022 21:13:28 GMT

Together, these geographical patterns raise the intriguing possibility that the D1 component is more typical of mainland Papua, and introgression may even have been ongoing in the time-frame of the split between Baining and Papuan populations. Similarly, there is a tendency for populations outside ISEA to show a Denisovan signal more consistent with D2 (Main Text Figure 3C), although the limited amount of Denisovan introgression means that the blocks contributing to the mismatches are shorter, and hence there is less resolution in the mismatch distributions. While it is possible that the reduced D1 signal in Baining samples is caused by weak Asian admixture (given the lack of evidence for the D1 signal in mainland Asia), this would additionally be expected to reduce the Baining D2 signal compared to mainland Papuans (not observed, see above) and generate an excess signal of Asian ancestry in the Baining as compared to mainland Papuans (not apparent in LOTER results, Main Text Figure 1A and Table S2; or in the PCA, Main Text Figure 1B, where the Baining cluster toward Australians rather than with the Asian-admixed Bougainville samples). We additionally note the detailed demographic analyses in Hudjashov et al., 2017, which strongly place the Baining as a recently separated Papuan population that does not harbor additional admixture signals from a wide range of other regional populations.
i. Assessing the multiple-ancestry hypothesis
Three lines of evidence support the probability that D1 and D2 represent introgression from two different archaic populations, likely both on the Denisovan clade. First, our approach to identifying D1 and D2 blocks, and the coalescent topologies that they represent, are consistent with both sets of blocks showing clear affinity to the Altai Denisovan over the Altai Neanderthal genomes, and clear divergence from most modern humans. Second, there is spatial variation in the prevalence of D1 in populations with some Denisovan introgression (e.g., mismatches consistent with D1 are underrepresented in Baining versus mainland Papua, Main Text Figures 5A and 5B, and may also be rarer in East ISEA, West ISEA and mainland Asia compared to D2, Main Text Figure 3C, although resolution is limited, Figure S2). This supports the likelihood of D1 and D2 arising from different source populations rather than a single population of composite ancestry. Third, and supporting the same conclusion, there is some evidence that the introgression dates of D1 and D2 blocks are different (STAR Methods S10h) and that there is spatial heterogeneity in the frequency of D1 chunks within Near Oceania. We therefore sought to determine whether a model with two pulses of introgression from archaic populations on the Denisovan clade could generate the mismatch distribution observed in modern Papuans using coalescent modeling.
We used the msprime v.0.6.1 program (Kelleher et al., 2013) to simulate a modified version of the highest-likelihood demographic model inferred by the Malaspinas et al., 2016 study, using Australians as a proxy for our Papuans. We first translated the original fastsimcoal2 model (provided by the authors) into msprime. We then modified the model as shown in Figures S4A and S4B, allowing two pulses of introgression from populations on the Denisovan clade. We did not incorporate reported inbreeding in the Altai Neanderthal. Apart from this modification, the structure of the model remains unchanged. As in that model, sampling times of the Altai Denisovan and Altai Neanderthal are 2058 and 2612 generations respectively. Parameters of the model are also unchanged and are given in Table S5 and Figures S5A and S5B, except:
1.
The single time of divergence between the Altai Denisovan and the introgressing Denisovan is replaced by two times, t1 and t2 indicating the divergence between the Altai Denisovan and branch D1, and the divergence between the Altai Denisovan/D1 common ancestor and D2
2.
The population size of all internal Denisovan-clade branches (Altai Denisovan/D1 common ancestor and Altai Denisovan/D1/D2 common ancestor) is set to NDeniAnc.
3.
Instead of a single Denisovan introgression into Australians 1353 generations ago of 4%, there are two introgressions 1353 generations ago into Papuans, one from D1 and one from D2. These are in proportion p1 × 0.04 and (1.0 – p1) × 0.04.
To determine whether our modified model can return the two observed peaks, and to propose demographic parameters, we simulated genetic data using the modified model and studied the mismatch distribution. Specifically, we simulated 5 Mb of sequence data for a sample of 144 Papuan haplotypes, two Altai Denisovan haplotypes and two Altai Neanderthal haplotypes. As in the Malaspinas et al., 2016 study, the mutation rate was set to 1.4 × 10−8 per base pair per generation. The recombination rate was constant and set to 1 × 10−8 per base pair per year. To mimic our own data, we masked all sites that were heterozygote in either the two Altai Denisovan haplotypes or the two Altai Neanderthal haplotypes. We extracted introgressed blocks from each Papuan haplotype in the simulation (using custom scripts and the detailed migration tables recorded by msprime) and recorded the mismatch between these blocks and the Altai Denisovan. The process was repeated 240 times for each parameter set, yielding a total of 1200 Mb simulated data.
The simulated data output by msprime is ‘perfect’ – there is no SNP calling process that might miss variation, and no missing data that would be masked by QC filters. As we want to avoid consequent biases in model inference, we sought to express the mismatch between introgressed blocks (simulated and real) and the Altai Denisovan genome as a proportion of the average mismatch to the Altai Denisovan observed in a population lacking introgression. We therefore converted the observed mismatch against Denisovan for each block in the real data into a fraction of the genome-wide average mismatch of our 75 West Eurasian samples. For the simulated data, we generated 1200 Mb of data for 150 West Eurasian chromosomes sampled with two chromosomes from the Altai Denisovan and Altai Neanderthal, using the unmodified model from the Malaspinas et al., 2016 study. As before, we masked archaic heterozygote sites, but this time calculated the mismatch over the entire dataset to obtain a population-average mismatch between simulated humans and Denisovan. When fitting the model, we expressed the mismatch distribution in introgressing Denisovan blocks found in simulated Papuans as a fraction of this simulated West Eurasian genome-wide average.
Our simulations support the probability that the D1 and D2 mismatch peaks reflect archaic ancestry in modern Papuans that derives from two populations on the Denisovan clade. Both of these populations were very distantly related to the Altai Denisovan, and more divergent that than the East Asian- (and Siberian-, in our data; Main Text Figure 3C) specific Denisovan introgression (D0 in Main Text Figure 4B). Based on a mutation rate of 1.4 × 10−8 and a generation time of 29 years, simulations suggests that the population contributing D1 chunks split from the Altai Denisovan approximately 261–297 kya, while the population contributing D2 chunks split from the Altai Denisovan approximately 334–377 kya. These dates are population split times, measured in years before present. In order to fit the sharp peaks we observed in the data, a small population size of the ancestral Denisovan population (< 350) is required. The model we explore is extremely simple, and we do not consider our results as proving that an ancestral Denisovan population of this size necessarily persisted for > 100 ky; a low population size or population bottlenecks, however, are strongly implied.
Our results suggest that Denisovans were highly structured when modern humans encountered them, consisting of multiple, highly diverged populations that remained sufficiently separate for hundreds of thousands of years to show distinct signatures when their genes are identified in modern humans. This raises the intriguing possibility that biogeographical barriers, and possibly islands, played an important role in maintaining Denisovan population structure. While the absence of the D1 signal in East ISEA and elsewhere may reflect a lack of resolution (Figure S2), the different amounts of D1 in our mainland Papua and Baining samples (Main Text Figures 5A and 5B) hints at geographic variation, potentially indicating different introgression histories in these populations (Main Text Figure 5E). We therefore sought to modify our simulation protocol to directly assess the probability of drift and sampling error generating the difference in D1 observed between mainland Papuans and Baining.
j. Less D1 signal in New Britain than New Guinea
We further modified the Malaspinas et al., 2016 simulation model with two introgressing Denisovan populations to incorporate a population representing the Baining. We sought to construct this model such that it incorporates an amount of drift between the Baining and Papuan population on the high end of realistic values, in order to generate a very conservative estimate of the model distribution of ratios of D1 signal between the two groups. Our model follows the demographic analyses presented above, which indicate that the Baining have no excess Asian admixture compared to mainland Papuans (LOTER; Figure 1A; Table S2) and are closely related to Papuans (PCA; Figure 1B), and additional analyses in Hudjashov et al., 2017, which show that the Baining cluster with Papuans and have no additional admixture signals when analyzed together with an even broader set of regional populations. The structure of the model involves the Baining budding from the Papuan population at a time tB, after Denisovan introgression into their common ancestor. The haploid population size of the ancestral Papuan/Baining population before the split is 8834 (see Figure S5B). After budding, the Baining have a population size of NBaining and the mainland Papuans have a population size of NPapua ; these are either constant or functions of time (see discussion below). The two populations are modeled as entirely isolated after budding, to maximize drift and ensure that the model is conservative. Based on the extremely similar levels of Asian ancestry in Baining and Papuans (LOTER, Main Text Figure 1A; Table S2) and the similar placement of Baining and Papuans by PCA (Figure 1B), the migration rate between Baining and Asians was set to the same rate as between Papuans and Asians. The Baining and Papuan populations have the same introgression history in the simulations, such that both D1 and D2 introgress at 1353 generations ago into the Papuan/Baining ancestral population. The sample sizes of the Baining and Papuans were set to 32 and 104 haploid chromosomes, respectively, corresponding to the 16 and 52 individuals in our Baining and mainland Papuan samples.
To determine the split time between the mainland Papuan and Baining population, we used the SMC++ v1.9.3 split option, which analyses pairs of populations simultaneously to infer genetic divergence times jointly with population size histories (Terhorst et al., 2017). These split times are effective split times, based on a hard split model without migration after populations diverge. These estimates do not depend on phasing. We used unphased data with the 99% call-rate filter applied, which yielded a split time between mainland Papuan and Baining populations at 15680 years BP with split time diploid Ne = 4620 (Main Text Figure 5C). Using genomic data without the call-rate filter resulted in a very similar estimate (split time = 16280 years BP, Ne = 4940). The mutation rate was fixed to 1.45 × 10−8 (Narasimhan et al., 2017) and generation time to 29 years; chromosome 6, which contains the hypervariable HLA region, was excluded from the analysis.
We used these SMC++ results to implement Model 1, with tB = 540 generations ago and the population sizes for Baining and mainland Papuans after tB as inferred by SMC++ (incorporating a recent population bottleneck among the Baining and recent population growth for mainland Papuans). We simulated 40 Gb of data using the model as 8000 independent 5 Mb simulations using msprime, recording the total amount of D1 and D2 introgression observed in the mainland Papuan and Baining samples using the migration tables in msprime. We used this information to construct a simulated null distribution of the ratio of D1 (and D2) sequence in mainland Papuans relative to the Baining. We performed 5000 resampling iterations whereby we drew 5 Mb simulations from the set of 8000 simulations until the average total amount of introgressing D1 and D2 sequence in a simulated individual was equal to or just greater than the average amount observed in our Papuan and Baining samples (3.05 Mb). On average, this led to us using just eighteen 5 Mb simulations totaling 90 Mb of simulated data per resampling iteration. The observed median D1[Papua]/D1[Baining] pairwise ratio (1.36) is placed on the 98.5th percentile of the simulated distribution (Main Text Figure 5D), indicating that the excess D1 found in mainland Papuans compared to Baining is highly unlikely to be explained by drift alone. In contrast, the observed median D2[Papua]/D2[Baining] pairwise ratio (1.06) is placed on the 65.3th percentile of the simulated D2 ratio distribution, indicating that drift alone is sufficient to explain the excess D2 in mainland Papuans.
Given that we use population sizes inferred based on two different methods (SMC++ for recent times and the joint site frequency spectrum for times before tB and non-Papuan/Baining populations), we sought to confirm that the model correctly captures the drift observed in mainland Papuans and the Baining, and the average divergence of these populations. The observed average heterozygosity of the Papuan and Baining samples was 6.43 × 10−4 and 6.00 × 10−4 respectively, and the weighted FST (Equation 10 in Weir and Cockerham, 1984) calculated using all sites that were variable between the samples was 0.0934. We used the simulation model described above to generate null distributions of these values by performing the same calculations on 5000 resampling iterations of four hundred 5 Mb simulations (totaling 2 Gb simulated data per iteration). The observed heterozygosity of the mainland Papuans was at the 0.2th percentile of the simulated distribution and the heterozygosity of the Baining was 98th percentile of the distribution, while the observed FST was well over the range generated by the simulations (Figure S6A). This suggests that the simulated Papuans have insufficient drift and that the two populations are insufficiently diverged, such that the SMC++ informed model may simulate Papuan and Baining samples that are more similar in their D1 ratios than would be expected based on observed drift and divergence.

We second asked how an older population split time between mainland Papuans and the Baining might impact results. There is early archaeological evidence of early human occupation on New Britain from 35.5 kya (Pavlides and Gosden, 1994), although importantly genetic divergence is often expected to be more recent due to migration between two populations after separation, which can affect SMC++ split time estimates (Terhorst et al., 2017). Transportation of obsidian tools occurs within the Bismarck Archipelago during the Pleistocene and externally to mainland Papua by the mid-Holocene (Swadling and Hide, 2005). Evidence for post-settlement contact can also be found in the translocation of plant and animal species (Swadling and Hide, 2005), though these appear to have occurred considerably after initial occupation (O’Connor, 2010), suggesting limited contact over long periods. To assess the implications of an earlier split date, we re-implemented the model setting tB = 800 (23.3 ky). This is more ancient than the inferred split times between mainland Papuan populations (10–20 kya; Bergström et al., 2017) and is of a similar order to the Papuan/aboriginal Australian split (11–27 kya; Mallick et al., 2016).
The results from these two additional models are shown in Main Text Figure 5D. As before, the observed D2 excess in mainland Papua is not outside the distribution expected due to drift. Similarly, in both cases the observed D1 excess in mainland Papua is unexpected given a model whereby the difference is introgression followed by drift and sampling – for Model 1 the observed D1 ratio is in the 97.8th percentile of the simulated distribution and for Model 2 the D1 ratio is in the 95.4th percentile of the simulated distribution. Together, these simulations suggest that the reduced frequency of D1 blocks among the Baining is unlikely to result from drift, and instead is more likely to reflect a different Denisovan introgression history among Baining compared to mainland Papua.

Admin
Administrator

Posts: 73,125

Genetic History of Tibetan Highlanders Jan 6, 2022 22:37:30 GMT

Quote

Post by Admin on Jan 6, 2022 22:37:30 GMT

a. High frequency Denisovan blocks
We sought to assess evidence of adaptive introgression from the two Denisovan ancestries, in our high confidence set of Denisovan introgressing blocks, by calculating the frequency of Denisovan ancestry over the genome. While archaic blocks may drift to high frequency, they are more likely than low-frequency blocks to have been subject to natural selection – either adaptive introgression during the initial introgression process or subsequent selection on introgressed variation. We first retrieved all introgressing blocks > 20 Kb from our data, and filtered out the blocks having more mismatch with Denisovan compared to mismatch with Neanderthal. We then used the bedtools v.2.27.0 ‘multiinter’ command to obtain intersected Denisovan-introgressed regions and frequencies among all Papuan individuals. We assigned genes from the Ensembl 91 (GRCh37) database to each intersected block, and report the top 1% frequency regions and the frequency of all introgressed regions in Table S6A. We repeated this procedure for East ISEA individuals (Table S6B).
A genome-wide map of Denisovan introgression in the Papuan and East ISEA samples (Main Text Figure 6), based on Tables S6A and S6B, reveals several sharp peaks at known (e.g., WARS2, Racimo et al., 2017; TNFAIP3, Gittelman et al., 2016; and FAM178B, Sankararaman et al., 2016 and Ilardo et al., 2018) and unreported (e.g., WDFY2, the TMPO/IKBIP/APAF1 gene cluster) loci. The replication of several loci that have previously been proposed to be subject to adaptive introgression strongly supports our approach to detecting Denisovan introgression and potentially adaptively introgressed regions. Some of our higher frequency blocks overlap previously identified deserts of introgression (Vernot et al., 2016), but this is not unexpected given the large size of the proposed deserts and the small number of Papuans samples they were identified from (N = 35). Specifically, we see high frequency introgressing Denisovan blocks at the ROBO2 gene in Papuans (chr3:76572330-76634485, 39.6% frequency) overlapping a proposed 14 Mb desert (chr3:76500000-90500000). Lower frequency Denisovan blocks (maximum 16%) also occur in a proposed 10.9 Mb desert (chr8:54500000-65400000).
b. Overlap with modern selection signals
High frequency Denisovan introgressed blocks could arise due to two different selective processes – either directly and immediately on the introgressing haplotypes, leading to longer high frequency haplotypes, or on introgressed diversity some time after the introgression event. We can approximately predict that the former relate primarily to biological differences between humans and the archaic species, while the latter relate to interactions between humans and their environment (e.g., disease, diet, etc.). In the latter model, introgression provides a source of genetic variation that is non-random in the sense that it has already been subject to evolutionary forces in the archaic population. This genetic variation may provide novel opportunities for adaptive selection in human groups with archaic introgression, even many thousands of years after the introgression ended.
To detect signals of recent positive selection in genetic regions with high Denisovan introgression, we calculated nSL (Ferrer-Admetlla et al., 2014) on all SNPs with ancestral information for the Baining of New Britain, mainland Papuan population of New Guinea and East ISEA continental group. We divided the genome into non-overlapping 200 Kb windows and defined the nSL statistic score of a window as the proportion of SNPs with |nSL| > 2.0. We discarded windows with fewer than 10 SNPs. We then assessed overlap between top 5% nSL window scores and top 1% frequency introgressed Denisovan blocks (see above), comparing introgression signals in Papuans to nSL for the Baining and mainland New Guinea groups, and introgression signals in East ISEA to nSL for the East ISEA group.
We found that only 3/34 Denisovan introgressed haplotypes that were high frequency in Papua were in nSL top 5% windows in the Baining group and these were not significant. For completeness, these genes were TNFAIP3 (nSL percentile 3.6%), WDFY2 (2.2%) and SUMF1 (4.5%).
In mainland Papuans, 2/34 high-frequency Denisovan introgressed haplotypes were top 5% nSL hits – GLT8D2 (1.1%) and ZNF280D (2.9%). In East ISEA, just 1/39 top 1% high-frequency Denisovan introgressed haplotypes was a top 5% nSL hit – TMEM131 (nSL percentile 1.7%).
Given the suggested role of WDFY2 in lipid metabolism adaptation, we assessed the mainland Papuan and Baining nSL top 1% gene lists for enrichment of fat metabolism pathways (method described below). We did not observe enrichment, but did note the presence of genes important in lipid metabolism and synthesis – most notably windows including FASN in Baining, New Guinea and East ISEA, and FADS1 and FADS2 in Baining only. We also note the presence of the important carbohydrate metabolism gene AGL in our top 1% nSL gene list in both the Baining and mainland New Guinea. Further work is required to determine the precise role of adaptation and the detailed evolutionary history of these genes in Oceanian populations.
c. Gene ontology enrichment
We tested whether specific sets of genes have significantly elevated frequencies of Denisovan ancestry using the Ontologies tab of the Enrichr web interface (Kuleshov et al., 2016). Specifically, we retrieved all genes identified as introgressed at high frequency (top 1%, using Tables S6A and S6b) and searched for enrichment in the Gene Ontology lists ‘GO Cellular Component 2018’, ‘GO Biological Process 2018’ and ‘GO Molecular Function 2018’, as well as the two phenotype lists ‘MGI Mammalian Phenotype 2017’ and ‘Human Phenotype Ontology’ and one tissue expression list ‘Jensen TISSUES’. We performed this analysis for both combined Papuan and East ISEA groups; results that survive multiple hypothesis test corrections and are not driven by clusters of co-located genes are reported in the Main Text, and include enrichment associated with expression in adipose and uterine tissue and fetus development. Full results, including categories that either i) had uncorrected p values < 0.005 (the corrected p values using Benjamini-Hochberg method are also reported) and/or ii) were driven by multiple co-located genes are given in Table S6C. Enrichment was observed in categories related to smooth muscle cell proliferation, immunity and adipogenesis in both Papuans and East ISEA, e.g., ‘negative regulation of smooth muscle proliferation’ involving TNFAIP3/PPARG genes (NG: p value = 0.001, corrected p value = 0.1; EISEA: p = 0.0005, corr-p = 0.049); ‘negative regulation of inflammatory response’ involving TNFAIP3/SAMSN1 genes in Papuans (p = 0.0005, corr-p = 0.09) and TNFAIP3/PPARG genes in East ISEA (p = 0.003, corr-p = 0.07); and ‘positive regulation of fat cell differentiation’ involving PPARG and WDFY2 genes (NG: p = 0.002, corr-p = 0.1; EISEA: p = 0.0009, corr-p = 0.05).
d. High frequency long and D1/D2 blocks
As longer high-frequency introgressing blocks are expected when a Denisovan haplotype rises to high frequency early in the introgression process, we repeated our introgressing block frequency analysis for blocks > 180 Kb that we were able to assign to one of the two Denisovan ancestries, D1 (Table S6D) and D2 (Table S6E). Analyzing blocks assigned to D1 introgression revealed two regions at high frequency (> 20% in Papua), containing FAM178B/FAHD2B/ANKRD36, ZNF280D and FBXL20/MED1/CDK12. Analyzing blocks assigned to D2 introgression revealed five regions at high frequency, containing ANKRD28, NFAT5/NQO1, COG7/GGA2/EARS2/UBFD1/NDUFAB1 and ARID4A/TOMM20L/TIMM9/KIAA0586. A gene-free region 15 Kb downstream of CENPW was also highly introgressed based on D2 blocks. We observed an extreme nSL signal (nSL percentile 0.2%) in the window containing CENPW in the Baining group, and note that the window containing ZNF280D (D1) also had a high nSL signal (see above).
As we are only able to assign D1 and D2 ancestry to large blocks > 180 Kb, we additionally explored a concept whereby confident D1 or D2 blocks might be used as local ‘flags’ for their respective ancestries. In this way, we can adopt an assumption – that short, < 180 Kb, introgressing Denisovan blocks overlapping a > 180 Kb D1 (or D2) chunk are also from the D1 (or D2) population – to leverage off our high confidence Denisovan ancestry dataset. We performed a bootstrapping analysis whereby we repeatedly sampled two Papuan individuals from our dataset and, using their > 20 Kb high confidence introgressing Denisovan blocks, identified the overlap between them. We divided the genome into 40 Kb non-overlapping windows and, for each window, recorded whether the pair had overlapping introgression. We performed this resampling 100000 times, counting the number of observations in each 40 Kb genomic window. Then, for each > 180 Kb D1 and D2 block, we identified the most commonly observed 40 Kb window, and ranked D1 and D2 blocks according to this frequency (Tables S6D and S6E, column ‘BOOTSTRAP_RANK_20KB’). While the frequency of D1 and D2 blocks themselves and their ranks according to the above analysis are highly correlated, some rare low-frequency introgressed blocks assigned to D1 and D2 cover regions that are highly introgressed based on smaller blocks. This may reflect occasional misclassification (STAR Methods S10d), such as when a small number of D2 blocks are erroneously identified as D1 leading to an apparent low-frequency D1 introgression event, or adaptive introgression of primarily smaller Denisovan blocks.
e. High frequency residual S∗ windows
We additionally determined the frequency of uncommon residual S∗ windows (see STAR Methods S12, below) found in Papua (Table S6F) and East ISEA (Table S6G). Residual S∗ is a signal that would be expected given non-Neanderthal, non-Denisovan archaic introgression (e.g., it would be consistent with introgression from H. erectus), but could equally be caused by other processes including balancing selection or local properties of molecular evolution (e.g., an accelerated mutation rate in non-African populations). In our combined Papuan sample, top 1% residual S∗ frequency blocks include a cluster of genes around VN1R1 (Vomeronasal 1 Receptor 1), as well as PDE1C (Phosphodiesterase 1C), DPH6 (Diphthamine Biosynthesis 6) and PRKCH (Protein Kinase C Eta). In our East ISEA sample, top 1% residual S∗ frequency blocks include a cluster of genes around HLA-A (Major Histocompatibility Complex, Class I, A), and PDE1C and PRKCH. There is considerable correlation between the frequency of residual S∗ blocks in Papua and East ISEA, such that the HLA-A region is also found at high frequency in Papuan residual S∗ data and the VN1R1 region is at high frequency in East ISEA residual S∗. While these two genes are especially intriguing – VN1R1 may be involved in the species-specific pheromone system in other species (Rodriguez and Mombaerts, 2002) and sociosexual behavior in humans (Henningsson et al., 2017), and the hypervariable HLA-A gene plays a critical role in immunity – further study is required to determine the evolutionary history detected by the residual S∗ signal in each case. In particular, archaic introgression (Abi-Rached et al., 2011) and balancing selection masquerading as archaic introgression (Yasukochi and Ohashi, 2017) have both been proposed for the HLA region, and may likewise play a role in the signal around VN1R1.

Admin
Administrator

Posts: 73,125

Genetic History of Tibetan Highlanders Jan 7, 2022 2:27:57 GMT

Quote

Post by Admin on Jan 7, 2022 2:27:57 GMT

S12 - Residual S∗ signal
The S∗ method is designed to identify archaic introgression without requiring the introgression to be derived from a population with similarity to a known, sequenced archaic hominin. As such, studying this signal may reveal otherwise cryptic evidence of introgression from hominins outside the Neanderthal and Denisovan clades. Additionally, the signal that S∗ identifies – non-African variation in high linkage disequilibrium – would be expected to occur due to structure in the Out of African migration(s). Given the known presence of Homo floresiensis in our study area (Brown et al., 2004, Sutikna et al., 2016), the possibility that late Homo erectus was contemporary with the earliest anatomically modern humans in ISEA (Yokoyama et al., 2008), and that a proposed early Out of Africa model may be required to explain genetic diversity patterns in Papuans (Pagani et al., 2016), we sought to further profile the S∗ signal (Main Text Figure 7).
a. No more than 1% unexplained archaic introgression
The output of the S∗ analysis consists of non-overlapping 50 Kb windows reported for each genome (rather than each chromosome copy). Global patterns of S∗ (> 99% confidence, i.e., higher confidence signal, see STAR Methods S7c) show a sharp peak in Papuans (Table S3A; Figure S7A). Calculating pairwise sharing of these S∗ windows (Figure S7B) indicates that the signal is quite broadly shared, with Papuans again unusual in sharing a lot of signal between each other and with East ISEA. These patterns are consistent with known patterns of Denisovan introgression, but could also be caused by other demographic or introgression processes. We first sought to assess to what extent this signal might be driven by Denisovan introgression. We used bedtools to remove S∗ > 99% confidence windows that were inferred to be caused by Neanderthal introgression, based on > 5% coverage of the merged set of HMM and CP Neanderthal introgressed blocks over both chromosome copies. We refer to this trimmed dataset as S∗NoNean (Figure S7A, right pane). We observe that S∗NoNean retains its sharp peak in Papua, while causing a reduction in the overall introgression signal of 75%–80% in all populations when compared to S∗. Repeating this process but instead removing S∗ windows inferred to be caused by Denisovan introgression leads to a slight dip in S∗NoDeni signal in Papua (Figure S7A, right pane), consistent with Denisovan introgression explaining the majority of the excess S∗ Papuan introgression signal. While the overall introgression signal drops by 84% in Papua compared to S∗, there is still a fall of 54% in West Eurasia.
This analysis raises two interesting points. First, it is possible to detect the distinctive introgression signal in Papua using S∗. With only a Neanderthal genome available, we would further be able to classify the source of introgression as non-Neanderthal using S∗NoNean. Alternatively, with only a Denisovan genome available, we would be able to use S∗NoDeni to identify the primary driver of this signal as ‘Denisovan’ introgression, as opposed to early out-of-Africa (OOA) processes involving modern humans, or additional introgression from an unknown archaic source. This suggests that S∗ is also well-suited to discovering introgression from unknown hominins, by studying signal behavior when masking introgression from known hominins.
Second, studying the West Eurasian signal is particularly informative as West Eurasians carry minimal known Denisovan introgression. Two statistical patterns are important. Removing Denisovan introgression blocks from West Eurasian S∗ might be expected to cause a minimal reduction in introgression signal. Instead, there is a substantial 54% reduction in introgression signal relative to S∗ when studying S∗NoDeni, confirming that our CP and HMM Denisovan block sets contain considerable spillover from outside the Denisovan clade. This spillover is likely due to Neanderthal introgression (as explored in STAR Methods S9a and S9b), but as we do not study this ambiguous signal in depth, we cannot rule out introgression from Neanderthal/Denisovan sister clades. Conversely, removing Neanderthal introgression blocks from West Eurasian S∗ might be expected to remove virtually all the introgression signal. Instead, a considerable 27% of the introgression signal remains. This could be due to false positives in the S∗ signal; or limited power of other methods to detect Neanderthal introgression; or a result of hitherto unknown introgression processes detected by S∗ but not CP or the HMM. The overlap in the S∗ signal that is removed when trimming Neanderthal or Denisovan introgression confirms our observation in Table S3B – that a great deal of introgressing blocks are ambiguous, showing greater similarity to both the Neanderthal and Denisovan genomes than human variation, likely due to the more recent common ancestry of the archaic hominins.
Because the excess Papuan S∗ signal is so completely eliminated by filtering out Denisovan introgression (Figure S7A, right pane), a very simple calculation puts a tentative upper bound on the amount of introgression into humans from outside the human/Neanderthal/Denisovan clade. Papuans have 97.2 Mb S∗ signal compared to the 40.8 Mb observed in Europeans. Assuming the 56.4 Mb excess corresponds to 4% Denisovan introgression, we might expect S∗ to detect 28.2 Mb from 2% Neanderthal introgression – given that the power of S∗ to detect introgression from Neanderthals and Denisovans in humans is expected to be similar following their similar genetic distance from humans and introgression times. This leaves 12.6 Mb of S∗ signal unaccounted for in West Eurasians and Papuans – a combination of false positives, limited power of the CP and HMM, and, potentially, unknown introgression signals – suggesting a maximum of ∼1% introgression from outside the human/Neanderthal/Denisovan clade. As such introgression would be expected to be easier to detect, given a similar introgression date, than Neanderthal or Denisovan introgression due to greater divergence from humans, this is only intended as an approximate upper bound. The bound is on average additional introgression in West Eurasia and Papua, but the absence of obvious excess signal in other regions suggests it applies more broadly. Nevertheless, individual populations within continents or isolated groups not captured by our sampling may have greater amounts of highly divergent introgression.
b. Profiling the residual S∗ signal
While the calculation above suggests that if introgression from outside the (Human, Denisovan, Neanderthal) clade occurred, it was limited; it remains interesting to attempt to identify possible regional peaks in S∗ that are consistent with such introgression. We therefore attempted to remove introgression signals from the (Neanderthal, Denisovan) clade by filtering out both Neanderthal and Denisovan introgressing blocks, as inferred by CP and the HMM. Starting with the S∗ > 99% confidence output, we now used bedtools to remove any S∗ windows with more than 5% cumulative overlap from the union of CP and HMM Neanderthal and Denisovan blocks (see Figure S7C schematic). We are interested in the remainder, which we call residual S∗ (RS∗).
On average, individuals had 172 residual S∗ windows. We observed that sharing of residual S∗ between continental groups is common. To quantitatively profile this pattern while taking sample size into account, we randomly down sampled each population to 20 individuals 1000 times and counted the number of residual S∗ windows that were observed in all continental groups (‘global’), 4 to 8 continental groups (‘widespread’), or 1–3 continental groups (‘uncommon’). The southeast Asian group was excluded due to its small sample size, such that the analysis incorporated Papua, East ISEA, West ISEA, South Asia, East Asia, Siberia, America and West Eurasia. Table S3G shows the average amount of residual S∗ sequence per individual in each category (also see Figure S7D).
Papua has the lowest signal of residual S∗ (151 blocks/individual covering 8.8 Mb, 10.4% of the original 1265 S∗ blocks/individual), while South Asia has the highest residual S∗ signal (185 blocks/individual covering 10.6 Mb, 23.1% of the original 710 S∗ blocks/individual). The differences between groups are small (Figure S7A, right pane), and approximately 15%–20% of residual S∗ windows are found globally (Figure S7D), and over half are widespread. This broad distribution may reflect limitations of our African sample in capturing African variation, demographic events such as pre-OOA genetic structure or shared drift during the OOA bottleneck, evolutionary forces such as purifying selection within Africa, or unusual genomically local patterns of molecular evolution that are not captured by the simulation model. Interestingly, Papua, West Eurasia and South Asia show the highest proportion of uncommon residual S∗ signal (Table S3G; Figure S7D). While this may partly reflect an Asian ancestry bias in our definitions of continental groups, the pattern is consistent with local demographic processes specifically impacting these populations.
A potential cause of excess uncommon residual S∗ is region-specific introgressive sequences that coalesce earlier than the (H,N,D) group of known hominins. This is consistent with the placement of Homo erectus on the hominin species tree (but also many other causes; see below). Such sequence could be caused by direct introgression from H. erectus; or introgression from Denisovans if the introgressing Denisovan population had, like the Altai Denisovan (Meyer et al., 2012), mixed with H. erectus. It could also be caused by incomplete lineage sorting within the Neanderthal or Denisovan populations that are known to have mixed with modern humans; by balancing selection; and by increases in local mutation rate in non-Africans. The topologies of interest are (X,(H,(D,N))), (X,(D,(H,N))) and (X,(N,(D,H))). While we cannot accurately calculate topologies (see STAR Methods S10 g) on genomic windows, which cover both chromosome copies and will frequently be chimeras of different coalescent histories, we can make simple predictions about the frequency of certain mutation motifs given H. erectus introgression – following the [H,N,D,X] notation, a substantial increase in the frequency of 0001 and an increase in 1110 that is dependent on the split time of H. erectus and modern humans. To assess evidence for H. erectus introgression, we therefore retrieved all global and uncommon residual S∗ blocks in each continental group, and divided the sum of the 0001 and 1110 mutation motifs observed in these blocks by their total sequence length.
While our initial calculations identified a clear excess in average 1110 and especially 0001 mutation motifs/bp in East ISEA and Papua, further investigation revealed that this was largely driven by high-frequency introgressed windows around the HLA-A gene. HLA regions have been discussed in the context of archaic introgression (Abi-Rached et al., 2011), and balancing selection masquerading as archaic introgression (Yasukochi and Ohashi, 2017). Given the possible role of balancing selection or locally accelerated evolution in the HLA region, we profiled the frequency of 0001 and 1110 motifs when excluding chromosome 6 from the analysis (Figure S7E). There is a tendency toward higher 0001 and 1110 in East ISEA and Papua, centered on the islands of Flores and Lembata. Uncommon residual S∗ windows in West Eurasia tend to have relatively high rates of the 1110 motif.
These mutation motif patterns suggest a slight excess of unique variation that is not shared with humans, the Altai Denisovan or the Altai Neanderthal in East ISEA and Papua. However, the signal is not strong, and the difference in total RS∗ between populations is small, suggesting at most little introgression from outside the Human/Neanderthal/Denisovan lineage in these populations. Still, the question remains as to the cause of this mutation motif pattern. Homo erectus introgression has been suggested among Andaman populations (Mondal et al., 2016), but debate is ongoing (Skoglund et al., 2018). The broader region is known to have harbored both H. floresiensis and H. erectus in a time frame potentially overlapping occupation by modern humans. However, Papua especially is the global center of gravity of Denisovan introgression among modern human populations. The Altai Denisovan is thought to have some H. erectus ancestry (Lipson and Reich, 2017, Mallick et al., 2016, McColl et al., 2018, Prüfer et al., 2014, Skoglund et al., 2016), though it is not yet clear whether this is also true for introgressing Denisovan populations. Alternatively, region-specific introgression from either Neanderthals or Denisovans could introduce haplotypes coalescing outside the (H,N,D) tree due to incomplete lineage sorting. Further analysis of other statistics, or the specific haplotypes driving the residual S∗ signal, coupled with complex simulations, would be required to fully clarify this question, and are beyond the scope of this work.

Admin
Administrator

Posts: 73,125

Genetic History of Tibetan Highlanders Jan 7, 2022 4:23:41 GMT

Quote

Post by Admin on Jan 7, 2022 4:23:41 GMT

S13 – Rampasasa is not an introgression outlier
Our dataset includes 19 samples from Rampasasa, Flores, a village that is home to some individuals of unusually short stature and close to the cave where Homo floresiensis (Brown et al., 2004) bones were found. The dataset also includes two other villages on Flores (Cibol and Bena) and samples representing many other islands in East ISEA. This offers the opportunity to test for anomalous signals of unusual archaic introgression into Rampasasa, as recently also assessed by Tucci et al., 2018, with the benefit of being able to include samples from nearby and regional populations. Compared to surrounding regions, we did not detect any unusual signs of Neanderthal or Denisovan introgression in the village – the total amount of the genome with evidence for Neanderthal introgression only (62.9 Mb) was similar to neighboring villages (e.g., Cibol 64.4 Mb) and at the lower end of the East ISEA range (62.9–67.2 Mb). Denisovan introgression is similarly low (23.6 Mb; Cibol 23.8 Mb; region 23.6–42.9 Mb). The levels of Denisovan and Neanderthal introgression are exactly as expected based on the proportion of Papuan ancestry in Rampasasa (Main Text Figure 7).
As with all other populations, the S∗ statistic detected a substantial archaic signal in Rampasasa that could not be assigned by CP or the HMM to either Denisovan or Neanderthals. We also studied residual S∗: S∗ windows that explicitly exclude Neanderthal or Denisovan introgression and so may be enriched for introgression signal contributed by genetically uncharacterized hominins, such as H. erectus introgression if it occurred. Although the residual S∗ signal in Rampasasa is high, there was no clear evidence of excess residual S∗ signal compared to regional or global populations (per individual, 20 Mb in Rampasasa; Cibol 19.4 Mb; region 18.7–20 Mb; also see Main Text Figure 7). An analysis of the composition of the residual S∗ signal (STAR Methods S12) indicates that Rampasasa, and East ISEA and Papua more broadly, are relatively more consistent with limited H. erectus introgression (Figure S7E), but we emphasize again that the signal is not conclusive and that various other explanations exist (see STAR Methods S12 above). Our findings are consistent with those of Tucci et al., 2018, in that while they cannot rule out additional archaic introgression into Rampasasa, they suggest that any such introgression must have been extremely limited. By including highly local populations in our analysis we are able to provide additional regional context, further emphasizing that Rampasasa is not an outlier compared to nearby villages and islands.