|
Post by Admin on Mar 10, 2019 18:31:21 GMT
Two-Population Model. In this approach we follow a two-step strategy (18). First we estimate demographic parameters of the null model using summary statistics that quantify recent African population structure. Using these model parameters, we then test the hypothesis of no admixture using a different summary statistic that is designed to detect low levels of genetic exchange between modern and archaic humans. The null model of recent African population structure without archaic admixture incorporates divergence, migration, and recent population growth (Fig. 1A). We calculate a composite likelihood of the summarized data on a grid of parameter values (18) (SI Materials and Methods). Parameter estimates, along with simulation-based 95% confidence intervals (CIs), are given in Table S1. Two patterns emerge from these analyses: estimates of the start of growth are very recent, and estimates of the population split time are relatively old. Although the recent growth estimates are consistent with results of previous studies (23, 24), the estimate of a divergence time that predates the origin of modern humans based on fossil data (450 kya, Biaka–Mandenka comparison) was unexpected. There are several possible explanations for this observation. First, it is possible that the true divergence time is old and that AMH evolved within the context of a geographically structured population. Alternatively, it is possible that the true divergence time is younger and that the old estimate arose either by chance or by bias caused by model misspecification (i.e., the true demographic model is different from Fig. 1A). Our simulations suggest that the large divergence estimate might happen by chance roughly 2% of the time, if the true demographic model (i.e., without admixture) were as in Fig. 1A (SI Materials and Methods and Table S2). Fig. 1. Schematic of the (A) two-population model and the (B) three-population model. Both demographic models test the fit of admixture with an archaic group (dotted lines) who split from the ancestors of modern humans at time T0 and a (%) of alleles introgressed into the modern gene pool at time Ta. The dashed lines represent all possible locations where admixture could occur. Both models begin with a single population of size Na, followed by a population split at time T1, with population growth beginning at times g1 and g2, and a constant symmetric migration rate M. For B, an additional population split at time T2 also occurs. This model also assumes that the ancestors of the San split first from those of the Mandenka and Biaka (22). Three-Population Model. To complement the first approach, we also implemented an approximate-likelihood method to estimate admixture parameters under a three-population isolation and migration model (Fig. 1B). Because this is a new inferential strategy, we explain our approach in some detail. In an isolation and admixture model (17) we expect to find loci with both deep haplotype divergence (reflecting a long period of isolation for those haplotypes that trace to different subpopulations) and elevated levels of LD (reflecting a reduced time for diverged haplotypes to recombine). If levels of admixture are low, then one class of haplotypes is expected to be at low frequency (i.e., a small basal clade). In other words, low levels of recent admixture with an archaic human population are likely to produce data with a small subsample of sequences that are highly diverged over an extended region of the chromosome. With this in mind, we developed our three summary statistics as follows. For each locus, we identify the two most diverged sequences and then define two groups, G1 and G2, by genetic similarity to the two designated sequences. From this we set our three statistics for approximate likelihood: (i) D1, the fraction of polymorphisms that are shared between G1 and G2. D1 reflects the amount of recombination and thus is sensitive to the time of introgression, Ta. (ii) D2, the ratio of the number of differences between the two distinguished sequences described above and the number of fixed sequence differences between human and chimpanzee. D2 reflects the relative time-depth of the genealogy and thus is sensitive to the archaic split time, T0. (iii) D3, the size of the smaller of the groups, G1 and G2. D3 reflects the relative size of the two most basal clades and thus is sensitive to the amount of admixture, a. Our approximate-likelihood protocol estimates the distribution of the summary statistics D1, D2, and D3 on the basis of the simulation of a large number of ancestral recombination graphs (ARGs). An important part of this protocol is the choice of tolerances or bin sizes δ1, δ2, and δ3 for their respective summary statistics. In general, we chose tolerances to maximize power for a = 1% (SI Materials and Methods). We find that the data are significantly unlikely under the null model of no admixture (i.e., the likelihood ratio test yields a bootstrapped P value of 0.0493). We note that this result is conservative because it is based on estimates of recombination rate that are biased downward and a tolerance that is less powerful in regions of high recombination (see below). Interestingly, we find evidence for two separate peaks in the maximum-likelihood surface: (i) an older peak with an archaic split time, T0 ≈ 700 kya, a time of admixture, Ta ≈ 35 kya, and an admixture proportion, a ≈ 2%; and (ii) a more recent peak with T0 ≈ 375 kya, Ta ≈ 15 kya, and a ≈ 0.5% (Fig. 2). Although our method has little power to infer the exact admixture proportion, we can place 95% CIs on the times of divergence (125 kya < T0 < 1.5 Mya) and admixture (Ta < 70 kya) (SI Materials and Methods). Note that T0 for the more recent peak is consistent with the Biaka–Mandenka split time estimates from the two-population model. Fig. 2. Approximate likelihood profile based on 60 loci for time of introgression and archaic split time. A log-likelihood difference of 3.92 defines the 95% confidence region (using the χ2 approximation). Likelihood estimates at each locus have at least 10 ARGs for both ψold and ψrecent. Likelihood Ratios of Individual Loci. The two inferential methods can also assess the evidence for archaic admixture at individual loci (SI Materials and Methods). Both methods identify the same locus on chromosome 4 (4qMB179) as a strong candidate for archaic admixture (P < 5 × 10−4 for each method). Table 1 describes the three loci exhibiting the lowest P value in the three-population model. Of the six individuals in the minimum clades, four are Biaka (4qMB179, 18qMB60) and two are San (13qMB107). Although both inferential methods identified the 13qMB107 locus as a likely candidate, the result is much more significant for the three-population (P < 0.001) vs. two-population (P = 0.049) model (Table 1). We note that the power of the two-population approach is reduced when evidence of introgression is limited to a short tract of DNA (as in the case of 13qMB107 where it is found only in the first subset; Discussion). For 18qMB60, the two-population method excludes singletons from the S* analysis. If they were included, the P value for 18qMB60 would be below 0.01 (Table 1). Table 1. Three loci that favor an alternative model
Locus Likelihood ratio P value T0 (Mya) Ta (kya) D1 D2 D3 S* P value 13qMB107 44.38 <0.001 1 20 0.1 0.264 2 0.049 4qMB179 39.85 <0.001 1.5 20 0 0.366 3 <0.001 18qMB60 12.74 0.022 0.75 20 0 0.192 1 >0.05† The likelihood ratio is defined to be maxψ{L(ψ|data)}/maxψ{L(ψ|data), ψ ∈ H0} and the P value determined with a parametric bootstrap. These values along with parameter values in columns 4–8 refer to results from the three-population model, whereas the S* P values in the last column refer to results from the two-population model.
To address the question of whether some loci favor one maximum in the likelihood surface over the other (i.e., ψrecent vs. ψold), we compute the likelihood ratio (SI Materials and Methods) for each locus (Fig. S2). Notably, the four most extreme likelihood ratios include the three loci that individually favor ψold (Table 1).
|
|
|
Post by Admin on Mar 11, 2019 17:25:25 GMT
Analysis of the 4qMB179 Region. We now turn to a focused analysis of the 4qMB179 region, a region characterized by no evidence of recombination between the major clades and deep haplotype divergence. In the ≈20-kb region that was initially surveyed (Fig. 3A), we identified 20 SNPs (and one insertion) that separate three Biaka haplotypes (B1–B3; Table S3) from all of the remaining African sequences. To determine the full length of the unusual pattern of SNPs, we gathered additional DNA sequence data from all individuals in our panel (Fig. 3A) and identified a 31.4-kb region with 37 completely linked sites where the Biaka haplotypes are 0.3% diverged from the other sequences in our sample. Using a simple model of isolation followed by recent mixing, we next developed likelihood-based methods for estimating a split time and admixture time for the locus (SI Materials and Methods). We estimated an initial split time of 1.25 Mya (95% CI, 0.7–2.1 Mya) and an admixture time of 37 kya (95% CI, 1–137 kya) (Fig. S3). Fig. 3. (A) Schematic of the original (filled bars) and extended sequence data (open bars) for the 4qMB179 locus. The unusual Biaka haplotype extends for ≈31.4 kb between the vertical dotted lines. (B) Recombinational landscape as inferred from HapMap Phase I data. Geographic Surveys. A survey of the insertion that is diagnostic for the divergent haplotype at 4qMB179 (i.e., at position 179,598,847 in Table S3) in 502 individuals from West, East, central, and southern Africa reveals that it reaches its highest average frequency (3.6%) in Pygmy groups from west-central Africa (Fig. 4). The variant is also found at low average frequencies (0.8%) in some non-Pygmy groups from West and East Africa. An A→G mutation that marks the divergent haplotype at 18qMB60 shows a similar distribution—also reaching its highest average frequency in the Pygmy groups—although it is found at slightly lower frequencies than the variant at 4qMB179 (i.e., 1.6% vs. 3.6%, respectively). This variant is also found in some non-Pygmy groups, exhibiting similar average frequencies as the 4qMB179 variant in West Africans (0.8%), East Africans (0.8%), and southern Africans (0.5% vs. 0.0%, respectively) (Fig. 4). Interestingly, the distribution of the G→A variant marking the divergent haplotype at 13qMB107 exhibits a somewhat different geographic distribution, reaching its highest average frequency in our sample of southern Africans (6.3%, and especially in the San at a frequency of 11.9%) rather than in central African Pygmies (average of 5.2%). However, it is important to note that its presence in our sample of central Africans is entirely limited to the Mbuti, where it has a frequency of 14.8%. Fig. 4. Frequency of introgressive variants within three sequenced regions in an expanded sample of ≈500 sub-Saharan Africans (SI Materials and Methods). The filled bar represents the frequency of a variant marking the divergent haplotype at 4qMB179 (Left), 18qMB60 (Center), and 13qMB179 (Right) in each of 14 population samples. Each horizontal line on the bar charts represents a frequency of 5%. Discussion Our inference methods reject the hypothesis that the ancestral population that gave rise to AMH in Africa was genetically isolated and point to several candidate regions that may have introgressed from an archaic source(s). For example, we identified a ≈31.4-kb region within the 4qMB179 locus with highly diverged haplotypes, one of which is found at low frequency in several Pygmy groups in central Africa. We hypothesize that the unusual haplotype descends from an archaic DNA segment that entered the AMH population via admixture. The observed haplotype structure is highly unusual (P < 5 × 10−5), even when we account for recent population structure or uncertainty in the underlying recombination rate (Table S4). It is noteworthy that the two ends of the archaic haplotype correspond to recombinational hotspots in the 4qMB179 region (Fig. 3B), suggesting that an initially much longer block of archaic DNA was whittled down by frequent recombination in the hotspots. Both inferential methods also identified the 13qMB107 locus as a likely introgression candidate; however, only ≈7 kb of the surveyed region contains SNPs that are in high LD, all of which are found at the 5′ end of the sequenced region in two San individuals. To determine whether the length of the unusual pattern of SNPs extends beyond our sequenced region at 13qMB107, we examined public full genome sequence data (25). We identified a San individual (!Gubi) who carried one copy of the unusual 13qMB107 haplotype and noted a run of heterozygous sites that extended an additional ≈7 kb to the 5′ side of our sequenced region. Like the case of 4qMB179, the two ends of the unusual haplotype correspond to recombinational hotspots, and analysis of 13qMB107 yields an estimated divergence time of ≈1 Mya and a recent introgression time (≈20 kya) (Table 1). The geographic distribution of the introgressive variant at 18qMB60, a third candidate identified in the three-population model (Table 1), is very similar to that of 4qMB179, albeit consistently found at lower frequencies (Fig. 4). On the other hand, the distribution of the introgressive variant at 13qMB107 is distinguished from that of the other two candidate loci by its presence in the San and the southern African Xhosa, as well as in Mbuti from the Democratic Republic of Congo. Interestingly, the Mbuti represent the only population in our survey that carries the introgressive variant at all three candidate loci, despite the fact that no Mbuti were represented in our initial sequencing survey. Given that the Mbuti population is known to be relatively isolated from other Pygmy and neighboring non-Pygmy populations (26), this suggests that central Africa may have been the homeland of a now-extinct archaic form that hybridized with modern humans. We have relied on an indirect approach to detect ancient admixture in African populations because there are no African ancient DNA sequences to make direct comparisons with our candidate loci. As proof of principle that an indirect approach can be useful, we reexamined the RRM2P4 pseudogene on the X chromosome. Using a similar approximate-likelihood methodology, it was previously posited that a divergent allele at the pseudogene introgressed from an archaic taxon in Asia (27, 28). We compared human and Neandertal RRM2P4 sequences and found that the three derived sites that define the non-African basal lineage are shared with Neandertal (Fig. S4). Thus, we verified that this unusual human sequence, which is characterized by a deep haplotype divergence and a small basal clade, is indeed shared with an archaic form. Further genome-level (i.e., multilocus) analysis will also shed light on the process of archaic admixture, which is likely to be more complicated than we have modeled. For instance, the multimodal likelihood surface in Fig. 2 suggests that gene flow among strongly subdivided populations in Africa may characterize multiple stages of human evolution in Africa. Our results are consistent with earlier inferences supporting the role of archaic admixture in sub-Saharan Africa based on analyses of coding regions (19) and the Xp21.1 noncoding region (16). Although our estimates of isolation and admixture dates are tentative, the results point to relatively recent genetic exchange with an unknown archaic hominin that diverged from the ancestors of modern humans in the Lower-Middle Pleistocene and remained isolated for several hundred thousand years. Despite a fragmentary African fossil record, there are plenty of candidates for the source(s) of this introgression. Beginning ≈700 kya, fossil evidence from many parts of Africa indicate that Homo erectus was giving way to populations with larger brains, a change that was accompanied by several structural adjustments to the skull and postcranial skeleton (14). By ≈200 kya, individuals with more modern skeletal morphology begin to appear in the African record (8, 14). Despite these signs of anatomical and behavioral innovation, hominins with a combination of archaic and modern features persist in the fossil record across sub-Saharan Africa and the Middle East until after ≈35 kya (12, 14). Although there is currently a major debate about the meaning of this piecemeal or mosaic-like appearance of modern traits for taxonomic classification (12, 29), the evidence presented here and elsewhere suggests that long-separated hominin groups exchanged genes with forms that either were in the process of evolving fully modern features, or were already fully modern in appearance. The emerging geographic pattern of unusual variants discovered here suggests that one such introgression event may have taken place in central Africa (where there is a very poor fossil record). Interestingly, recent studies attest to the existence of Late Stone Age human remains with archaic features in Nigeria (Iwo Eleru) and the Democratic Republic of Congo (Ishango) (30⇓–32). The observation that populations from many parts of the world, including Africa, show evidence of introgression of archaic variants (6, 16, 19) suggests that genetic exchange between morphologically divergent forms may be a common feature of human evolution. If so, hybridization may have played a key role in the de novo origin of some our uniquely human traits (33). PNAS September 13, 2011 108 (37) 15123-15128
|
|
|
Post by Admin on Jan 31, 2020 20:27:22 GMT
Neanderthal DNA sequences may be more common in modern Africans than previously thought, and different non-African populations have levels of Neanderthal ancestry surprisingly similar to each other, finds a study publishing January 30 in the journal Cell. Researchers arrived at these findings by developing a new statistical method, called IBDmix, to identify Neanderthal sequences in the genomes of modern humans. The results also suggest that African genomes contain Neanderthal sequences in part due to back-migration of ancestors of present-day Europeans. "Our study is significant because it provides important new insights into human history and patterns of Neanderthal ancestry in globally diverse populations," says senior study author Joshua Akey of Princeton University. "Our results refine catalogs of genomic regions where Neanderthal sequence was deleterious and advantageous and demonstrate that remnants of Neanderthal genomes survive in every modern human population studied to date." Past studies have suggested that East Asians have approximately 20% more Neanderthal ancestry compared to Europeans. But the new findings suggest that these estimates may have been biased due to methodological limitations. Previously developed approaches, such as S*, use a modern reference panel—usually an African population assumed to lack Neanderthal ancestry. But if the reference panel unexpectedly contains Neanderthal sequences, then the method will underestimate Neanderthal ancestry in modern humans. To address this problem, Akey and his colleagues developed IBDmix as a new category of methods for detecting archaic ancestry. Instead of using a modern reference panel, the approach calculates the probability that an individual's genotype is shared identical by descent (IBD) with an archaic reference genome. Compared with S*, IBDmix is a less biased approach because it has higher statistical power for detecting shared archaic sequences and yields fewer false positives. The researchers applied IBDmix to 2,504 modern individuals from the 1000 Genomes Project, which represents geographically diverse populations, and used the Altai Neanderthal reference to identify Neanderthal sequence in these individuals. They robustly identified regions of Neanderthal ancestry in Africans for the first time, identifying on average 17 megabases (Mb) of Neanderthal sequence per individual in the African samples analyzed (which corresponds to approximately 0.3% of the genome), compared with less than one megabase reported in previous studies. More than 94% of the Neanderthal sequence identified in African samples was shared with non-Africans. The researchers also observed levels of Neanderthal ancestry in Europeans (51 Mb/individual), East Asians (55 Mb/individual), and South Asians (55 Mb/individual) that were surprisingly similar to each other. Strikingly, East Asians had only 8% more Neanderthal ancestry compared to Europeans, in contrast to previous reports of 20%. "This suggests that most of the Neanderthal ancestry that individuals have today can be traced back to a common hybridization event involving the population ancestral to all non-Africans, occurring shortly after the Out-of-Africa dispersal," Akey says. To explore potential explanations for the unexpectedly high Neanderthal ancestry in Africans, the researchers then compared the actual data to simulated genotype data derived from different demographic models. This analysis took into account various sequence characteristics, such as the length of the shared archaic segments, the frequency of these segments in Africans, and the amount of sequence shared exclusively between African and non-African populations. They found that Africans exclusively share 7.2% of Neanderthal sequence with Europeans, compared with only 2% with East Asians. Simulations showed that low levels of back-migration persisting over the past 20,000 years can replicate features of the data and could therefore be a possible explanation for the observed levels of ancestry among different modern populations. The results suggest that previously developed methods using an African reference population are biased toward underestimating Neanderthal ancestry to a greater extent in Europeans compared to East Asians. "Collectively, these results show that Neanderthal ancestry estimates in East Asians and Europeans were biased due to unaccounted-for back-migrations from European ancestors into Africa," Akey says. But gene flow went in both directions. The data also suggest that there was a dispersal of modern humans out of Africa approximately 200,000 years ago, and this group hybridized with Neanderthals, introducing modern human DNA into the genomes of Neanderthals. According to the authors, both out-of-Africa and into-Africa dispersals must be accounted for when interpreting global patterns of genomic variation.
|
|
|
Post by Admin on Feb 1, 2020 7:10:38 GMT
Summary Admixture has played a prominent role in shaping patterns of human genomic variation, including gene flow with now-extinct hominins like Neanderthals and Denisovans. Here, we describe a novel probabilistic method called IBDmix to identify introgressed hominin sequences, which, unlike existing approaches, does not use a modern reference population. We applied IBDmix to 2,504 individuals from geographically diverse populations to identify and analyze Neanderthal sequences segregating in modern humans. Strikingly, we find that African individuals carry a stronger signal of Neanderthal ancestry than previously thought. We show that this can be explained by genuine Neanderthal ancestry due to migrations back to Africa, predominately from ancestral Europeans, and gene flow into Neanderthals from an early dispersing group of humans out of Africa. Our results refine our understanding of Neanderthal ancestry in African and non-African populations and demonstrate that remnants of Neanderthal genomes survive in every modern human population studied to date. Graphical Abstract Introduction Studies of ancient DNA are transforming our understanding of human evolutionary history and, in particular, how admixture has shaped past and present patterns of human genomic variation (Nielsen et al., 2017, Pääbo, 2014, Vattathil and Akey, 2015, Vernot and Pääbo, 2018). Of particular interest has been the discovery that admixture with archaic hominins occurred multiple times throughout human history (Green et al., 2010, Meyer et al., 2012, Prüfer et al., 2014, Reich et al., 2010). In particular, approximately 2% of all non-African ancestry is derived from Neanderthals (Green et al., 2010, Meyer et al., 2012, Prüfer et al., 2014, Sankararaman et al., 2016, Vernot et al., 2016, Wall et al., 2013), with Oceanic populations having an additional 2%–4% of ancestry attributable to gene flow with Denisovans (Browning et al., 2018, Mallick et al., 2016, Sankararaman et al., 2016, Vernot et al., 2016). The ability to identify introgressed hominin sequence in the genomes of modern humans enables inferences about the functional, evolutionary, and phenotypic significance of archaic admixture. For example, the genomic distribution of surviving Neanderthal and Denisovan lineages has been influenced by purifying selection (Harris and Nielsen, 2016, Juric et al., 2016), which has purged introgressed sequence that was deleterious in modern humans. Indeed, some exceptionally large regions depleted of archaic ancestry (also referred to as “archaic deserts”) have been identified and may be due to selection (Sankararaman et al., 2014, Sankararaman et al., 2016, Vernot and Akey, 2014, Vernot et al., 2016). There is also strong evidence that some Neanderthal and Denisovan sequences were beneficial (Dannemann et al., 2016, Huerta-Sánchez et al., 2014, Mendez et al., 2012a, Mendez et al., 2012b, Racimo et al., 2017, Racimo et al., 2015) and were rapidly driven to high frequency in modern human populations by a process known as adaptive introgression (Dannemann et al., 2017, Gittelman et al., 2016, McCoy et al., 2017, Simonti et al., 2016). In general, however, the functional impacts of introgressed sequences, how they have been shaped by selection, and how they have influenced modern human health and disease are only beginning to be explored. Moreover, a consistent observation in all studies of archaic hominin admixture is that East Asian populations have approximately 20% more Neanderthal ancestry compared to Europeans (Nielsen et al., 2017, Sankararaman et al., 2014, Sankararaman et al., 2016, Vernot and Akey, 2014, Vernot et al., 2016, Wall et al., 2013). Numerous models have been invoked to explain this difference, including the interaction of demography and selection (Kim and Lohmueller, 2015, Lazaridis et al., 2016, Sankararaman et al., 2014), dilution by non-admixed populations (Lazaridis et al., 2016, Meyer et al., 2012), or additional population-specific admixture events (Kim and Lohmueller, 2015, Vernot and Akey, 2015, Villanea and Schraiber, 2019). Accurately determining variation in Neanderthal ancestry among non-African populations has important implications for refining our understanding of admixture between modern human ancestors and Neanderthals. Despite the methodological progress that has been made to identify introgressed hominin sequence, opportunities for further development of statistical tools abound and may result in novel insights. For example, a recent extension of the S∗ framework revealed two waves of Denisovan admixture in East Asian populations that were not previously detectable (Browning et al., 2018). To this end, we describe a novel method for detecting Neanderthal ancestry in modern humans that does not require an unadmixed reference human panel, which we refer to as IBDmix. We apply IBDmix to genotype data from a large set of modern human individuals from Eurasia, America, and Africa. We make novel discoveries regarding Neanderthal ancestry in Africans and re-examine the relative levels of Neanderthal ancestry in Eurasian populations. We also replicate, extend, and discover new instances of adaptive introgression that may offer insight into human evolution and phenotypic variation in modern humans. Results
|
|
|
Post by Admin on Feb 1, 2020 18:25:43 GMT
Results Evaluating the Power and Robustness of IBDmix Methods that identify introgressed Neanderthal lineages in modern humans must differentiate between sequences shared with Neanderthals because of ancient hybridization or because of a shared common ancestor. Previous approaches, such as S∗ (Plagnol and Wall, 2006, Vernot and Akey, 2014), CRF (Sankararaman et al., 2014), diCal-admix (Steinrücken et al., 2018), and HMM (Skov et al., 2018), use an “unadmixed” modern reference panel, commonly an African population such as Yoruba (YRI), to control for false positives due to shared ancestry by “masking” putative archaic sequence present in the reference panel and the target sample. If the reference panel carries introgressed Neanderthal sequence, this will result in missing Neanderthal sequence in the target sample (Figure 1A). Our new method IBDmix, which is based on identity by descent (IBD), does not use a modern reference panel (Figure 1A). IBDmix calculates the probabilities that a variant site in a modern individual is and is not shared IBD with a reference archaic genome, while accounting for genotyping errors in the reference archaic and modern human sequences (STAR Methods; Table S1). The ratio of these probabilities is used to construct a single-site LOD score, where higher values indicate a greater likelihood that a modern individual’s genotype is shared IBD with the reference archaic genome. IBDmix uses a dynamic programming algorithm to sum together single-site LOD scores and maximize this score in order to identify introgressed segments (STAR Methods). The false-positive rate for IBDmix is controlled by the LOD score threshold and length of introgressed segments considered. Unlike existing methods that require phased sequence data, IBDmix works on unphased genotype data, making it more computationally tractable by avoiding time-consuming preprocessing and inaccuracies caused by phasing errors. It should be noted, however, that accurate estimates of allele frequency are required to calculate the probability of IBD, and so IBDmix cannot be used on individual genomes or in small sample sizes. In practice, we found that a minimum of ten individuals is sufficient for robust inferences (STAR Methods; Table S2). Figure 1 Evaluation of IBDmix Performance and Comparison to Previous Methods We evaluated IBDmix’s performance and operating characteristics using simulated data generated from a previously inferred realistic demographic model and compared it to results using S∗ (STAR Methods; Figure S1). As expected, IBDmix’s false-positive rate decreases and power increases as the introgressed segment size increases (Figure 1B). Compared to S∗, IBDmix has a lower false-positive rate and higher power for all introgressed segment sizes >30 kb (Figure 1B). Specifically, for introgressed segment sizes >30 kb, the power of IBDmix is >60% with an FDR ≤10% (Figures 1B and S1B). Note that the power and FDR of IBDmix in non-African populations are not influenced by gene flow from non-Africans into Africans, whereas they do have a large effect on S∗ (Figures 1B and 1C). The power to detect introgressed sequence in non-African populations is particularly low for S∗ when this sequence is also found in the reference population (Africans), whereas IBDmix maintains power (Figure 1C). This observation implies that biases may arise in methods that use a modern human reference panel, as the power to detect introgressed sequence will be a function of its presence in the reference panel. Figure S1 Simulated Model and Performance Evaluation for IBDmix, Related to Figure 1 and STAR Methods Show full caption We also tested the impact of genetic variation and mis-specification of recombination rates on IBDmix using simulated data. The performance of IBDmix improved overall with higher mutation rates (Figure S1C). As expected, we observed a noticeable improvement for shorter segments (FPR, FDR, and power; Figure S1C). In testing the effect of recombination rate on IBDmix performance, we used data generated from a model with no Neanderthal introgression. We evaluated the FPR of IBDmix under models with a recombination rate equal to the genome-wide average (1cM/Mb) and models 1/10th that rate (0.1cM/Mb). For larger segments (≥40 kb), we observed marginally higher false-positive rates in situations with the reduced recombination rate (Table S3). Previous studies have identified the introgressing Neanderthal population as a sister clade of the sequenced Altai Neanderthal (Malaspinas et al., 2016, Prüfer et al., 2017). We therefore tested how IBDmix would perform when the reference archaic genome is distantly related to the introgressing archaic. We simulated models with two Neanderthal lineages representing an introgressing lineage and a sampled reference lineage (non-introgressing lineage) and varied the split time between these two populations (STAR Methods). We observed a small decrease in power and FPR using the non-introgressing Neanderthal as the reference genome, but overall performance measures remained consistent (Figure S1D). In summary, IBDmix has higher power and lower FDR compared to S∗ and is robust to reference population biases. In the following, unless otherwise noted, we used a LOD score threshold of 4 and a minimum segment size of 50 kb, which provides a reasonable tradeoff between power and false-positive rate (Figure S1B). IBDmix Reveals Substantial Amounts of Neanderthal Signal in Africans and Nearly Uniform Levels in Non-African Populations
|
|