|
Post by Admin on Feb 2, 2020 5:54:18 GMT
We applied IBDmix to samples from the 1000 Genomes Project (Auton et al., 2015), collected from geographically diverse populations, and used the Altai Neanderthal reference genome (Prüfer et al., 2014) to identify introgressed Neanderthal sequence in these individuals. After filtering (STAR Methods), we identified 110.98 Gb of Neanderthal sequence among 2,504 modern individuals. When overlapping introgressed segments are merged, this equates to 1.29 Gb of unique Neanderthal sequence. Because IBDmix does not use a putatively unadmixed modern reference population, we were able to robustly identify regions of apparent Neanderthal sequence in African populations for the first time (Figure 2A). Surprisingly, we identified on average 17 Mb of Neanderthal sequence per individual in the African samples analyzed, and this value was similar across the mostly northern African subpopulations represented in the dataset (ranging from 16.4 Mb/individual in ESN to 18.0 Mb/individual in LWK; Figure 2A; Table S4). Furthermore, we observed a significant overlap of sequence identified in Africans with that in non-Africans (Figure 2B). Specifically, of the Neanderthal sequence identified in African samples, more than 94% was shared with non-Africans. Figure 2 Neanderthal Introgressed Sequence Detected in 1000 Genomes Project Populations We also recovered a substantial amount of Neanderthal sequence in non-African samples across populations. Notably, we found similar levels of Neanderthal ancestry in Europeans (51 Mb/individual), East Asians (55 Mb/individual), and South Asians (55 Mb/individual) (Figure 2A; Table S4). Surprisingly, we observed only a modest enrichment (8%) of Neanderthal ancestry in East Asian compared to European individuals. This contrasts with previous reports that have indicated ∼20% enrichment of Neanderthal ancestry in East Asians compared to Europeans (Sankararaman et al., 2014, Sankararaman et al., 2016, Vernot and Akey, 2014, Wall et al., 2013). The observed level of East Asian enrichment was even smaller (∼3%) when we were less conservative in our filtering methods (Table S5). We compared the Neanderthal sequences in non-African individuals identified by IBDmix (merged regions) to those identified by previous methods, including S∗, diCal-admix, and CRF, for individuals shared in all these studies. Approximately 80% of the sequences overlapped between the IBDmix callset and the other callsets (Figure S2). Figure S2 Comparing the Genomic Coverage of Neanderthal Sequence Detected by Different Methods, Related to STAR Methods Given the unexpectedly large amounts of Neanderthal sequence identified in African individuals, we next performed analyses to understand their origins. To rule out systematic biases, we first called Denisovan sequence in African individuals using IBDmix (STAR Methods) and only identified 1.2 Mb/individual of Denisovan sequence in African samples (Table S6). This is similar to the amount of Denisovan sequence called in non-African individuals (∼1Mb/individual) and considerably lower than the amount of Neanderthal sequence identified by IBDmix in African individuals. We also performed extensive simulations and found that the signal of Neanderthal ancestry in Africans was unlikely to be explained by false positives due to shared ancestry (Figure 2C). We next considered two demographic models that could plausibly generate signals of Neanderthal ancestry in Africans that are detectable by IBDmix. Specifically, we studied models where non-African individuals, who carry Neanderthal sequences inherited from hybridization, migrated back to Africa and models of human-to-Neanderthal gene flow due to an early pre-out-of-Africa (pre-OOA) dispersal of modern humans (Hubisz et al., 2019, Kuhlwilm et al., 2016). We found that IBDmix is sensitive to both back migrations and pre-OOA gene flow from modern humans to Neanderthals (Figure 2C). We therefore explicitly tested whether putative Neanderthal sequences identified in Africans were more likely to be explained by back-migration from non-Africans into Africa or by pre-OOA human-to-Neanderthal gene flow. To differentiate these scenarios, we compared the empirical data to simulated data, analyzing a variety of sequence characteristics (Figure 3). Specifically, we simulated genotype data under a series of demographic models that included Neanderthal admixture into non-Africans, increasing levels of back-migration from Europeans into Africans, and gene flow from a pre-OOA human lineage into Neanderthals at varying time points. We then identified introgressed sequence for these models using IBDmix. We compared the empirical and simulated data across features including introgressed segment length, frequency of introgressed segments in the African population that are shared with non-Africans, and the ratio of East Asian Neanderthal ancestry to European Neanderthal ancestry before and after masking Neanderthal sequence shared between Africans and non-Africans.
|
|
|
Post by Admin on Feb 2, 2020 22:01:45 GMT
Figure 3 Neanderthal Segments Identified in Africans Are a Consequence of Back-Migration and Human-to-Neanderthal Gene Flow In the empirical data, segments identified in Africans (YRI) that are shared with non-Africans (EAS and EUR) have a distribution of segment sizes more similar to that of non-African calls and also occur predominantly at high frequency (>10%) in the African population (Figure 3). As noted previously, there is only a small enrichment (<10%) for Neanderthal ancestry in East Asians compared to Europeans without masking sequence shared with Africans. When shared sequence is masked, however, this enrichment increases to ∼18% (Figure 3). These features are not replicated in either models with back-migration or human-to-Neanderthal gene flow alone. Specifically, while features like the distribution of segment lengths and the frequency of African segments in the African population are replicated in models with human-to-Neanderthal gene flow, only models with back-migration rates elevated in comparison to standard demographic estimates (5 × 10−5/generation) can replicate the enrichment of East Asian Neanderthal ancestry when masking shared African sequence. A model that combines both of these events, elevated back migration and human-to-Neanderthal gene flow, matches the empirical data best across all features. In summary, these data indicate that both pre-OOA human-to-Neanderthal gene flow and elevated historic back-migration contribute to the signal of Neanderthal ancestry detected in Africans. Back-Migration from European Ancestors Introduced Neanderthal Sequence into African Populations To further confirm the role of back-migration in introducing Neanderthal sequence into African populations, we examined the rate of overlap between called Neanderthal segments and non-African ancestry tracks in African samples. We hypothesized that if the Neanderthal sequence in Africans was introduced by back-migration from ancestors of contemporary Europeans, then there should be enrichment for overlap of Neanderthal segments and European ancestry segments in African samples. To test this hypothesis, we compared data from chromosome 1 for all 504 African samples in our analysis. For each individual, we identified tracks of European and East Asian ancestry using RFMix (Maples et al., 2013) and measured the rate of overlap with identified Neanderthal segments in the same individual (Figure 4A). We averaged these rates of overlap to calculate empirical rates of overlap for European ancestry and East Asian ancestry separately (Figure 4B). We found the rate of overlap with European ancestry to be highly significant (permutation p < 0.0001), while the rate of overlap with East Asian ancestry was not (permutation p > 0.05) (Figure 4B). These data are consistent with the hypothesis that back-migration contributes to the signal of Neanderthal ancestry in Africans. Furthermore, the data indicate that this back-migration came after the split of Europeans and East Asians, from a population related to the European lineage. Figure 4 Enrichment in Overlap of Neanderthal Segments and European Ancestry Segments in African Individuals Previous methods that have relied on unadmixed modern reference populations, like S∗, have reported >20% enrichment of Neanderthal sequence in East Asians compared to Europeans (Figure 5A). However, results from IBDmix show only 8% enrichment of Neanderthal sequence in East Asians compared to Europeans (Figure 5A). This level of enrichment is robust to changes in the segment size cutoff (30 kb, 40 kb, 50 kb) used for IBDmix calling (Table S5). To better understand the discrepancy between IBDmix and previous inferences, we first removed Neanderthal sequence called by IBDmix in Europeans and East Asians that was shared with Africans (YRI) and replicated an 18% enrichment of Neanderthal ancestry in East Asians compared to Europeans (Figure 5A). This result shows that our observation of similar levels of Neanderthal ancestry in Europeans and East Asians is due to no longer masking Neanderthal sequence shared with Africans.
|
|
|
Post by Admin on Feb 3, 2020 5:50:09 GMT
Figure 5 Disproportionate Sharing of Neanderthal Sequence Differentially Biases Estimates of Neanderthal Ancestry In the IBDmix callset for Africans, Europeans, and East Asians, there is a large enrichment of Neanderthal sequence shared exclusively between Africans and Europeans compared with the sequence shared exclusively between Africans and East Asians (Figure 5B). As a proportion of the total amount of Neanderthal sequence for each population, 7.2% of European sequence is shared exclusively with Africans, which is substantially higher than the 2% of East Asian sequence shared exclusively with Africans (Figure 5B). The disproportionate level of sharing between Africans and Europeans is consistent even after down-sampling the recovered Neanderthal segments in Europeans to match the total coverage of Neanderthal sequence in East Asians (STAR Methods). This imbalance in the proportion of exclusively shared sequence between African and non-African populations directly contributes to the biased Neanderthal ancestry estimates in previous methods that use an African reference panel. We also examined how the reference panel size for S∗ affects Neanderthal ancestry estimates by bootstrap resampling the Yoruba samples in 1000 Genomes Project data (n = 108) and reanalyzing chromosome 1 for Europeans and East Asians (Figure 5C). We generated multiple reference panels based on different sample sizes and re-called Neanderthal sequence for European and East Asian individuals using the S∗-pipeline and the new reference panels. We compared the total S∗-sequence called for each sample to the average amount of S∗-sequence called for samples using a reference panel of 1 individual. Increasing the reference panel size showed a significant reduction (p < 2 × 10−16) in the amount of Neanderthal sequence called per individual. In addition, when comparing the amounts of Neanderthal sequence identified in Europeans and East Asians, increasing the reference panel size decreased the amount detected for both populations, but there was a greater loss in Europeans than in East Asians. Using a reference sample larger than 10 led to an apparent 20% enrichment of Neanderthal ancestry in East Asians compared to Europeans, as previously reported. Simulations of European to African back-migration using rates consistent with standard demographic models also generate a significant enrichment of Neanderthal ancestry in East Asians compared to Europeans when the data are analyzed with S∗, so long as back-migration occurs after the split of European and East Asian lineages (p < 8 × 10−7; Figure S3). Collectively, these results show that Neanderthal ancestry estimates in East Asians and Europeans were biased due to unaccounted for back-migrations from European ancestors into Africans. Figure S3 Back-Migration Can Bias Amount of Recovered Neanderthal Sequence in S∗, But Not IBDmix, Related to STAR Methods Admixture with Neanderthals may have provided a mechanism for modern humans to acquire novel adaptive variation. Previous analyses have reported population-specific high-frequency introgressed Neanderthal haplotypes, which may be instances of adaptive introgression (Dannemann et al., 2017, Gittelman et al., 2016, Racimo et al., 2015, Simonti et al., 2016) or the reintroduction of alleles lost in the modern human lineage (Rinker et al., 2019). We examined our IBDmix callset for similar findings. We leveraged population-level derived allele frequencies of variants that overlapped calls made by IBDmix and matched the Neanderthal allele, in order to detect Neanderthal haplotypes with unusually large differences in frequency between populations. Specifically, for variants that intersected identified Neanderthal segments, we calculated the differences in the derived allele frequencies between Europeans and East Asians, Africans and Europeans, and Africans and East Asians. We then took an outlier approach to identify loci with allele frequency differences in the 99th percentile. We further filtered on loci where the derived allele matched the Neanderthal allele. Overall, we identified 38 non-African-specific high-frequency haplotypes and 13 African-specific high-frequency haplotypes (Table S7). We compared these identified high-frequency haplotypes with previously identified high-frequency haplotypes (Gittelman et al., 2016) and the presence of previously reported GWAS SNPs. Of the 38 non-African-specific high-frequency Neanderthal haplotypes we identified, 19 were previously reported by Gittelman et al. (2016), including well-known targets of adaptive introgression like WDR88, POU2F3, and TLR1/6/10 (Figure 6A and 6B ). Intriguingly, we also identified 31 high-frequency haplotypes shared by Africans and Europeans, including TRIM55 (Figure 6C; Table S7). These haplotypes would have been undetected in previous methods that relied on unadmixed reference human panels. Furthermore, we were for the first time able to detect African-specific high-frequency Neanderthal haplotypes (Figure 6D; Table S7). The 13 African-specific high-frequency Neanderthal haplotypes we identified show enrichment for genes involved in immunological function (e.g., IL22RA1 and IFNLR1) and ultraviolet-radiation sensitivity (e.g., DDB1 and IL22RA1) (Keeney et al., 1993, Kim et al., 2017). While some high frequency Neanderthal-like variants in Africans may derive from human-to-Neanderthal gene flow, only one of the high-frequency haplotypes shared by Africans and Europeans (chr3:89,587,868–90,134,709) overlaps a locus previously identified as introgressed from modern humans into the Altai Neanderthal (Kuhlwilm et al., 2016), and none of our detected African-specific high-frequency haplotypes do. These novel findings provide insight into the evolutionary history of these populations, the selective pressures they faced, and current variation in health and disease.
|
|
|
Post by Admin on Feb 3, 2020 18:29:34 GMT
Figure 6 Population-Specific High-Frequency Introgressed Segments Previous analyses have identified large (>10 Mb) autosomal regions of the genome that are significantly depleted of Neanderthal ancestry in all non-African populations (Sankararaman et al., 2014, Sankararaman et al., 2016, Vernot and Akey, 2014, Vernot et al., 2016). These large “deserts” of archaic introgressed sequence appear at frequencies greater than expected under neutral models. We analyzed our IBDmix call set to see if we could replicate previous findings or determine if deserts were a function of previous methodological biases. Following previously described methods to identify archaic deserts, we analyzed our IBDmix callset from both African and non-African samples (STAR Methods). We replicated 4 of the 6 previously reported deserts of Neanderthal sequence, including the deserts that contain FOXP2 (chr7) and ROBO1 and ROBO2 (chr3) (Table S8; Figure S4). Moreover, the four replicated deserts are the same regions previously shown to also be significantly depleted of Denisovan ancestry. Thus, depletions of archaic ancestry seem to be a general feature of the data and are not likely due to methodological issues in identifying introgressed sequence. It is noteworthy that including all African samples, a subset (YRI), or none does not dramatically change the distribution of the frequencies of large deserts. This is consistent with the observation that the African Neanderthal sequence is predominantly a subset of non-African segments. Figure S4 Visualization of S∗ and IBDmix Identified Desert Regions and Their Overlap, Related to STAR Methods
|
|
|
Post by Admin on Feb 4, 2020 5:46:52 GMT
Discussion We developed a novel approach to identify an introgressed hominin sequence that persists in the genomes of modern humans, and we show that it performs well compared to existing methods. The main novelty of IBDmix is that compared to previous methods, it does not use an unadmixed reference panel. As such, we were able to make unbiased inferences about signals of Neanderthal ancestry in African populations, which are a combination of genuine introgressed Neanderthal sequences and human sequences present in the Neanderthal genome. We also demonstrate that back-migrations to Africa confounded previous estimates of variation in Neanderthal ancestry among non-African populations. Furthermore, we confirmed and refined genomic regions significantly depleted of Neanderthal ancestry, as well as putative targets of adaptive introgression, including several loci that were previously not detectable when using an African reference population.
It is important to note, however, that IBDmix has several limitations. In particular, IBDmix requires an archaic reference genome and therefore is not suitable for discovering introgressed sequence from unknown or unsequenced hominin lineages. IBDmix also requires that populations be analyzed separately, and that a sufficiently large sample size be used, in order to robustly estimate population allele frequencies, assign LOD scores, and determine IBD (simulations suggest a minimum of ten individuals; Table S2). Additionally, recombination rate heterogeneity across the genome and between populations can influence IBDmix segment size cutoffs. Consequently, it will be difficult to apply IBDmix to individual genomes or ancient human samples, where the sample size is limited and estimates of allele frequencies and recombination rates are imprecise. As such, IBDmix complements existing approaches for identifying introgressed sequences in modern humans.
Applying IBDmix to geographically diverse populations revealed two unexpected observations. First, we discovered a stronger than expected signal of Neanderthal ancestry among African individuals. Specifically, among the 1000 Genomes African populations, we identified approximately 17 Mb of putative Neanderthal sequence per individual (Figure 2; Table S4), whereas previous inferences found considerably less than a megabase (ranging from 0.026 Mb in Esan to 0.5 Mb in Luhya) (Vernot et al., 2016). Accordingly, African individuals have approximately 33% as much detected sequence compared to non-African individuals. The higher signal of Neanderthal ancestry in African individuals is not entirely unexpected, as recent studies have suggested that assumptions about Neanderthal ancestry in Africans may have led to underestimates (Lorente-Galdos et al., 2019, Petr et al., 2019). Moreover, even early estimates of Neanderthal ancestry in non-Africans noted that there was likely some amount of Neanderthal sequence in Africans (Green et al., 2010, Sánchez-Quinto et al., 2012, Wang et al., 2013), albeit not at the magnitude we find. Furthermore, it is increasingly recognized that gene flow occurred among structured populations across the African continent (Scerri et al., 2018, Schlebusch et al., 2012, Skoglund et al., 2017), and Eurasian ancestry is found across Africa (Pickrell et al., 2014). Even early diverging groups like the Khoe-San have up to 30% ancestry from recent admixture with East Africans and Eurasians (Schlebusch et al., 2017). Therefore, it will not be surprising if Neanderthal ancestry, due to back-migrations, is present at varying levels across the African continent.
Our results also provide strong evidence that human sequence in the Neanderthal genome also contributes to the signal of the Neanderthal ancestry we detect in Africans. Previous studies have noted the genetic contribution of a pre-out-of-Africa gene-flow event from humans into Neanderthals (Hubisz et al., 2019, Kuhlwilm et al., 2016). The timing of this event, however, has been under debate, with estimates being revised from ∼100 ka (Kuhlwilm et al., 2016, Prüfer et al., 2017) to ∼150 ka (Kuhlwilm et al., 2016, Prüfer et al., 2017), and now perhaps as early as 250 ka (Hubisz et al., 2019). Our own data are most consistent with models of human-to-Neanderthal gene flow between 100 and 150 ka, as IBDmix does not detect any signal in simulations with earlier gene flow. However, our results do not preclude earlier instances of gene flow, only that IBDmix is not powered to detect them. Thus, it is tempting to speculate that perhaps there were multiple waves of pre-OOA dispersals and admixture between modern humans and Neanderthals, although additional data are needed to make more definitive inferences.
The second major insight afforded by IBDmix is that levels of Neanderthal ancestry among non-African populations are more uniform than previous estimates. Specifically, as opposed to the 20% enrichment of Neanderthal sequence previously found in East Asians compared to Europeans (Kim and Lohmueller, 2015, Lazaridis et al., 2016, Meyer et al., 2012, Vernot and Akey, 2015), we only find an approximately 8% enrichment (Figure 5A; Table S4). We show that the reason for this discrepancy is that previous inferences using an African reference population underestimated the amount of Neanderthal sequence in Europeans. Due to historical back-migrations preferentially from ancestral European populations, Neanderthal sequence has been disproportionately under-called in present-day Europeans compared to East Asians. We believe the modest 8% enrichment of Neanderthal sequence found by IBDmix is most parsimoniously explained by a single wave of Neanderthal admixture occurring after the out-of-Africa dispersal. Variation in Neanderthal ancestry could be attributable to later dilution by unadmixed populations (Lazaridis et al., 2016). In particular, present-day European populations are thought to be a mixture of three ancestral groups, one of which had ancestry from a Basal Eurasian lineage that had little or no Neanderthal ancestry (Lazaridis et al., 2014). Previous studies found that dilution could not explain Neanderthal ancestry differences as large as 20% (Kim and Lohmueller, 2015, Vernot and Akey, 2015) but can readily account for the modest differences we now find. Note that, however, our data do not preclude the possibility of additional, population-specific admixture events with Neanderthals. Numerous instances of admixture events are known from ancient human samples, even though these individuals did not contribute genetically to contemporary human populations (Fu et al., 2015, Yang et al., 2017). Nonetheless, the majority of Neanderthal ancestry can likely be explained by a single wave of admixture in the population ancestral to all non-Africans.
In summary, our data show that out-of-Africa and in-to-Africa dispersals must be accounted for when interpreting archaic hominin ancestry in contemporary human populations. It is notable that Neanderthal sequences have been identified in every contemporary modern human genome analyzed to date. Thus, the legacy of gene flow with Neanderthals likely exists in all modern humans, highlighting our shared history.
Published:January 30, 2020
|
|