Post by Admin on Feb 18, 2020 19:48:32 GMT
Two-Population Model.
In this approach we follow a two-step strategy (18). First we estimate demographic parameters of the null model using summary statistics that quantify recent African population structure. Using these model parameters, we then test the hypothesis of no admixture using a different summary statistic that is designed to detect low levels of genetic exchange between modern and archaic humans. The null model of recent African population structure without archaic admixture incorporates divergence, migration, and recent population growth (Fig. 1A). We calculate a composite likelihood of the summarized data on a grid of parameter values (18) (SI Materials and Methods). Parameter estimates, along with simulation-based 95% confidence intervals (CIs), are given in Table S1. Two patterns emerge from these analyses: estimates of the start of growth are very recent, and estimates of the population split time are relatively old. Although the recent growth estimates are consistent with results of previous studies (23, 24), the estimate of a divergence time that predates the origin of modern humans based on fossil data (450 kya, Biaka–Mandenka comparison) was unexpected. There are several possible explanations for this observation. First, it is possible that the true divergence time is old and that AMH evolved within the context of a geographically structured population. Alternatively, it is possible that the true divergence time is younger and that the old estimate arose either by chance or by bias caused by model misspecification (i.e., the true demographic model is different from Fig. 1A). Our simulations suggest that the large divergence estimate might happen by chance roughly 2% of the time, if the true demographic model (i.e., without admixture) were as in Fig. 1A (SI Materials and Methods and Table S2).
Fig. 1.
Schematic of the (A) two-population model and the (B) three-population model. Both demographic models test the fit of admixture with an archaic group (dotted lines) who split from the ancestors of modern humans at time T0 and a (%) of alleles introgressed into the modern gene pool at time Ta. The dashed lines represent all possible locations where admixture could occur. Both models begin with a single population of size Na, followed by a population split at time T1, with population growth beginning at times g1 and g2, and a constant symmetric migration rate M. For B, an additional population split at time T2 also occurs. This model also assumes that the ancestors of the San split first from those of the Mandenka and Biaka (22).
We then tested for archaic admixture using the estimated model parameters of the null model and a summary of LD (S*) that was specifically designed to be sensitive to archaic admixture (18, 19). The evidence for archaic admixture is extremely strong in the Biaka and the San (P < 10−4) but not in the Mandenka (P > 0.05). Quantile–quantile plots for the distribution of P values across loci are shown in Fig. S1.
Three-Population Model.
To complement the first approach, we also implemented an approximate-likelihood method to estimate admixture parameters under a three-population isolation and migration model (Fig. 1B). Because this is a new inferential strategy, we explain our approach in some detail. In an isolation and admixture model (17) we expect to find loci with both deep haplotype divergence (reflecting a long period of isolation for those haplotypes that trace to different subpopulations) and elevated levels of LD (reflecting a reduced time for diverged haplotypes to recombine). If levels of admixture are low, then one class of haplotypes is expected to be at low frequency (i.e., a small basal clade). In other words, low levels of recent admixture with an archaic human population are likely to produce data with a small subsample of sequences that are highly diverged over an extended region of the chromosome. With this in mind, we developed our three summary statistics as follows. For each locus, we identify the two most diverged sequences and then define two groups, G1 and G2, by genetic similarity to the two designated sequences. From this we set our three statistics for approximate likelihood: (i) D1, the fraction of polymorphisms that are shared between G1 and G2. D1 reflects the amount of recombination and thus is sensitive to the time of introgression, Ta. (ii) D2, the ratio of the number of differences between the two distinguished sequences described above and the number of fixed sequence differences between human and chimpanzee. D2 reflects the relative time-depth of the genealogy and thus is sensitive to the archaic split time, T0. (iii) D3, the size of the smaller of the groups, G1 and G2. D3 reflects the relative size of the two most basal clades and thus is sensitive to the amount of admixture, a.
Our approximate-likelihood protocol estimates the distribution of the summary statistics D1, D2, and D3 on the basis of the simulation of a large number of ancestral recombination graphs (ARGs). An important part of this protocol is the choice of tolerances or bin sizes δ1, δ2, and δ3 for their respective summary statistics. In general, we chose tolerances to maximize power for a = 1% (SI Materials and Methods).
We find that the data are significantly unlikely under the null model of no admixture (i.e., the likelihood ratio test yields a bootstrapped P value of 0.0493). We note that this result is conservative because it is based on estimates of recombination rate that are biased downward and a tolerance that is less powerful in regions of high recombination (see below). Interestingly, we find evidence for two separate peaks in the maximum-likelihood surface: (i) an older peak with an archaic split time, T0 ≈ 700 kya, a time of admixture, Ta ≈ 35 kya, and an admixture proportion, a ≈ 2%; and (ii) a more recent peak with T0 ≈ 375 kya, Ta ≈ 15 kya, and a ≈ 0.5% (Fig. 2). Although our method has little power to infer the exact admixture proportion, we can place 95% CIs on the times of divergence (125 kya < T0 < 1.5 Mya) and admixture (Ta < 70 kya) (SI Materials and Methods). Note that T0 for the more recent peak is consistent with the Biaka–Mandenka split time estimates from the two-population model.
In this approach we follow a two-step strategy (18). First we estimate demographic parameters of the null model using summary statistics that quantify recent African population structure. Using these model parameters, we then test the hypothesis of no admixture using a different summary statistic that is designed to detect low levels of genetic exchange between modern and archaic humans. The null model of recent African population structure without archaic admixture incorporates divergence, migration, and recent population growth (Fig. 1A). We calculate a composite likelihood of the summarized data on a grid of parameter values (18) (SI Materials and Methods). Parameter estimates, along with simulation-based 95% confidence intervals (CIs), are given in Table S1. Two patterns emerge from these analyses: estimates of the start of growth are very recent, and estimates of the population split time are relatively old. Although the recent growth estimates are consistent with results of previous studies (23, 24), the estimate of a divergence time that predates the origin of modern humans based on fossil data (450 kya, Biaka–Mandenka comparison) was unexpected. There are several possible explanations for this observation. First, it is possible that the true divergence time is old and that AMH evolved within the context of a geographically structured population. Alternatively, it is possible that the true divergence time is younger and that the old estimate arose either by chance or by bias caused by model misspecification (i.e., the true demographic model is different from Fig. 1A). Our simulations suggest that the large divergence estimate might happen by chance roughly 2% of the time, if the true demographic model (i.e., without admixture) were as in Fig. 1A (SI Materials and Methods and Table S2).
Fig. 1.
Schematic of the (A) two-population model and the (B) three-population model. Both demographic models test the fit of admixture with an archaic group (dotted lines) who split from the ancestors of modern humans at time T0 and a (%) of alleles introgressed into the modern gene pool at time Ta. The dashed lines represent all possible locations where admixture could occur. Both models begin with a single population of size Na, followed by a population split at time T1, with population growth beginning at times g1 and g2, and a constant symmetric migration rate M. For B, an additional population split at time T2 also occurs. This model also assumes that the ancestors of the San split first from those of the Mandenka and Biaka (22).
We then tested for archaic admixture using the estimated model parameters of the null model and a summary of LD (S*) that was specifically designed to be sensitive to archaic admixture (18, 19). The evidence for archaic admixture is extremely strong in the Biaka and the San (P < 10−4) but not in the Mandenka (P > 0.05). Quantile–quantile plots for the distribution of P values across loci are shown in Fig. S1.
Three-Population Model.
To complement the first approach, we also implemented an approximate-likelihood method to estimate admixture parameters under a three-population isolation and migration model (Fig. 1B). Because this is a new inferential strategy, we explain our approach in some detail. In an isolation and admixture model (17) we expect to find loci with both deep haplotype divergence (reflecting a long period of isolation for those haplotypes that trace to different subpopulations) and elevated levels of LD (reflecting a reduced time for diverged haplotypes to recombine). If levels of admixture are low, then one class of haplotypes is expected to be at low frequency (i.e., a small basal clade). In other words, low levels of recent admixture with an archaic human population are likely to produce data with a small subsample of sequences that are highly diverged over an extended region of the chromosome. With this in mind, we developed our three summary statistics as follows. For each locus, we identify the two most diverged sequences and then define two groups, G1 and G2, by genetic similarity to the two designated sequences. From this we set our three statistics for approximate likelihood: (i) D1, the fraction of polymorphisms that are shared between G1 and G2. D1 reflects the amount of recombination and thus is sensitive to the time of introgression, Ta. (ii) D2, the ratio of the number of differences between the two distinguished sequences described above and the number of fixed sequence differences between human and chimpanzee. D2 reflects the relative time-depth of the genealogy and thus is sensitive to the archaic split time, T0. (iii) D3, the size of the smaller of the groups, G1 and G2. D3 reflects the relative size of the two most basal clades and thus is sensitive to the amount of admixture, a.
Our approximate-likelihood protocol estimates the distribution of the summary statistics D1, D2, and D3 on the basis of the simulation of a large number of ancestral recombination graphs (ARGs). An important part of this protocol is the choice of tolerances or bin sizes δ1, δ2, and δ3 for their respective summary statistics. In general, we chose tolerances to maximize power for a = 1% (SI Materials and Methods).
We find that the data are significantly unlikely under the null model of no admixture (i.e., the likelihood ratio test yields a bootstrapped P value of 0.0493). We note that this result is conservative because it is based on estimates of recombination rate that are biased downward and a tolerance that is less powerful in regions of high recombination (see below). Interestingly, we find evidence for two separate peaks in the maximum-likelihood surface: (i) an older peak with an archaic split time, T0 ≈ 700 kya, a time of admixture, Ta ≈ 35 kya, and an admixture proportion, a ≈ 2%; and (ii) a more recent peak with T0 ≈ 375 kya, Ta ≈ 15 kya, and a ≈ 0.5% (Fig. 2). Although our method has little power to infer the exact admixture proportion, we can place 95% CIs on the times of divergence (125 kya < T0 < 1.5 Mya) and admixture (Ta < 70 kya) (SI Materials and Methods). Note that T0 for the more recent peak is consistent with the Biaka–Mandenka split time estimates from the two-population model.