Post by Admin on Jan 5, 2021 5:15:20 GMT
Deep introgression results
Having demonstrated reasonable power and accuracy in a simulation setting, we turned to an analysis of real modern and archaic human genomes. Our goals for this study were to identify and characterize introgressed regions from previously proposed migration events, as well as to look for evidence for new migration events, perhaps not detectable by other methods. Our data set consisted of two Africans from the SGDP [24], two Neanderthals [2, 9], the Denisovan [4], and a chimpanzee outgroup. For inference, we again assumed the demography illustrated in Fig 3, considering the old migration events only. We focus here on the model with tmig = 250kya and tdiv = 1Mya, because this model seemed to result in high power in all our simulation scenarios, and because our results suggest that it may be the most realistic (as discussed below). The results using other models are consistent with those presented here (see S1 Text).
Overall, we find that Hum→Nea regions are called most frequently, at a rate of ∼3% in both the Altai and Vindija Neanderthal (Fig 6; see also S3 Fig). This number is almost certainly an underestimate, given that the true positive rate for this model was estimated at 30–55%. By contrast, only ∼0.37% of regions are classified as Hum→Den. As no previous study has found evidence for Hum→Den migration, this migration band serves as a control, supporting our false positive rate estimate of 0.41% from simulations.
Fig 6. Genome-wide coverage of predicted ancient introgression.
Each bar shows total average coverage for a haploid genome, with darker shading (at bottom) representing homozygous calls. Solid bars are for autosomes, and striped bars for chromosome X. Predictions were based on a posterior probability cutoff of 0.5.
As noted, there is a well-known depletion on the X chromosome of archaic introgression into humans. By contrast, we observe high coverage of Hum→Nea introgression on the X chromosome for both the Altai and Vindija samples. Indeed, the coverage is somewhat higher on the X chromosome than the autosomes. However, this difference is likely due in large part to increased power on the X; simulations suggest that power will be ∼20% higher for this event when effective population sizes are multiplied by 0.75 (S4 Fig). Nevertheless, we observe considerable variation in detected introgression across the chromosomes, and several autosomal chromosomes have higher predicted coverage than the X, including chromosomes 1, 6, 21, and 22 (S3 Fig).
Although the Vindija sample is younger by 70kya than the Altai sample [9], it shows no depletion of human ancestry on the autosomes, suggesting that negative selection did not cause a significant loss of human introgressed regions during that interval. However, some individual chromosomes do show decreases in coverage from Altai to Vindija, with the largest drop on the X chromosome (S3 Fig).
Other migration events are detected at lower levels. We identify 1% of the Denisovan genome as introgressed from a super-archaic hominin—roughly double the estimated false positive rate (0.49%) for this event. Our apparent weak power for these events (another group has estimated ∼6% introgression [9]) suggests that the super-archaic divergence may have been somewhat recent (perhaps closer to 1Mya than 1.5Mya). Still, this analysis resulted in 27Mb of sequence that may represent a partial genome sequence from a previously unsequenced archaic hominin. In addition, ARGweaver-D predicted that a small fraction of the Neanderthal genomes is introgressed from a super-archaic hominin (0.75% for Altai and 0.70% for Vindija), an event that has not been previously hypothesized. However, these fractions only slightly exceed the estimated false positive rate (0.65%), so these results are likely dominated by spurious predictions.
The Sup→Den events (and perhaps Sup→Nea events) raise the possibility that super-archaic-derived sequences could have been passed, in turn, to modern humans through subsequent Den→Hum (or Nea→Hum) migration events. To explore this possibility, we intersected the predicted regions with introgression predictions in modern humans across the full SGDP data set (details in S1 Text). We found that most Sup→Den and Sup→Nea regions have higher-than-expected divergence to the Denisovans and Neanderthals (respectively) across all humans, and not just the two African humans analyzed by ARGweaver-D. In addition, 15% of the Sup→Den regions overlap with sequence introgressed into Asian and Oceanian individuals from Denisovans, and many of these regions also contain a high number of variants consistent with super-archaic introgression. We also observe that 35% of the Sup→Nea regions are introgressed in at least one modern-day non-African human. Notably, one region of hg19 (chr6:8450001-8563749) appears to be Neanderthal-introgressed and also overlaps a Sup→Nea region. A complete list of Sup→Den and Sup→Nea regions that overlap human introgressed regions, and the genes that fall in these regions, is available in S1 and S2 Tables.
We sought to obtain an improved estimate of the timing of migration the Hum→Nea event using the predicted introgressed regions. Initially, we attempted to gain information about timing from the segment lengths. However, we found that there is strong ascertainment bias towards finding longer regions, so that the length distributions are highly overlapping for different migration times (S1 Text). Instead, we turned to the frequency spectrum of introgressed regions, which provides a more robust signal. The older the migration, the more likely that an introgressed region has drifted to high frequency and is shared across the sampled individuals. For the Hum→Nea event, we found that 37% of our regions are inferred as “doubly homozygous” (that is, introgressed across all four Neanderthal lineages). This faction is close to what we observe in regions predicted from our simulations with migration at 250kya (38%), whereas simulations with migration at 150kya and 350kya had substantially different doubly-homozygous rates of 10% and 55%, respectively. To obtain a more precise estimate, we performed additional simulations with values of tmig = 200, 225, 275, and 300kya, and compared the frequency spectrum of introgressed regions after ascertainment using ARGweaver-D. Overall, we find that the divergence time cannot be pinpointed precisely by this method, but it can be fairly confidently bounded at 200kya < tmig < 300kya (S5 Fig). The same approach suggests that tmig > 225kya for the for the Sup→Den event (S6 Fig).
Having demonstrated reasonable power and accuracy in a simulation setting, we turned to an analysis of real modern and archaic human genomes. Our goals for this study were to identify and characterize introgressed regions from previously proposed migration events, as well as to look for evidence for new migration events, perhaps not detectable by other methods. Our data set consisted of two Africans from the SGDP [24], two Neanderthals [2, 9], the Denisovan [4], and a chimpanzee outgroup. For inference, we again assumed the demography illustrated in Fig 3, considering the old migration events only. We focus here on the model with tmig = 250kya and tdiv = 1Mya, because this model seemed to result in high power in all our simulation scenarios, and because our results suggest that it may be the most realistic (as discussed below). The results using other models are consistent with those presented here (see S1 Text).
Overall, we find that Hum→Nea regions are called most frequently, at a rate of ∼3% in both the Altai and Vindija Neanderthal (Fig 6; see also S3 Fig). This number is almost certainly an underestimate, given that the true positive rate for this model was estimated at 30–55%. By contrast, only ∼0.37% of regions are classified as Hum→Den. As no previous study has found evidence for Hum→Den migration, this migration band serves as a control, supporting our false positive rate estimate of 0.41% from simulations.
Fig 6. Genome-wide coverage of predicted ancient introgression.
Each bar shows total average coverage for a haploid genome, with darker shading (at bottom) representing homozygous calls. Solid bars are for autosomes, and striped bars for chromosome X. Predictions were based on a posterior probability cutoff of 0.5.
As noted, there is a well-known depletion on the X chromosome of archaic introgression into humans. By contrast, we observe high coverage of Hum→Nea introgression on the X chromosome for both the Altai and Vindija samples. Indeed, the coverage is somewhat higher on the X chromosome than the autosomes. However, this difference is likely due in large part to increased power on the X; simulations suggest that power will be ∼20% higher for this event when effective population sizes are multiplied by 0.75 (S4 Fig). Nevertheless, we observe considerable variation in detected introgression across the chromosomes, and several autosomal chromosomes have higher predicted coverage than the X, including chromosomes 1, 6, 21, and 22 (S3 Fig).
Although the Vindija sample is younger by 70kya than the Altai sample [9], it shows no depletion of human ancestry on the autosomes, suggesting that negative selection did not cause a significant loss of human introgressed regions during that interval. However, some individual chromosomes do show decreases in coverage from Altai to Vindija, with the largest drop on the X chromosome (S3 Fig).
Other migration events are detected at lower levels. We identify 1% of the Denisovan genome as introgressed from a super-archaic hominin—roughly double the estimated false positive rate (0.49%) for this event. Our apparent weak power for these events (another group has estimated ∼6% introgression [9]) suggests that the super-archaic divergence may have been somewhat recent (perhaps closer to 1Mya than 1.5Mya). Still, this analysis resulted in 27Mb of sequence that may represent a partial genome sequence from a previously unsequenced archaic hominin. In addition, ARGweaver-D predicted that a small fraction of the Neanderthal genomes is introgressed from a super-archaic hominin (0.75% for Altai and 0.70% for Vindija), an event that has not been previously hypothesized. However, these fractions only slightly exceed the estimated false positive rate (0.65%), so these results are likely dominated by spurious predictions.
The Sup→Den events (and perhaps Sup→Nea events) raise the possibility that super-archaic-derived sequences could have been passed, in turn, to modern humans through subsequent Den→Hum (or Nea→Hum) migration events. To explore this possibility, we intersected the predicted regions with introgression predictions in modern humans across the full SGDP data set (details in S1 Text). We found that most Sup→Den and Sup→Nea regions have higher-than-expected divergence to the Denisovans and Neanderthals (respectively) across all humans, and not just the two African humans analyzed by ARGweaver-D. In addition, 15% of the Sup→Den regions overlap with sequence introgressed into Asian and Oceanian individuals from Denisovans, and many of these regions also contain a high number of variants consistent with super-archaic introgression. We also observe that 35% of the Sup→Nea regions are introgressed in at least one modern-day non-African human. Notably, one region of hg19 (chr6:8450001-8563749) appears to be Neanderthal-introgressed and also overlaps a Sup→Nea region. A complete list of Sup→Den and Sup→Nea regions that overlap human introgressed regions, and the genes that fall in these regions, is available in S1 and S2 Tables.
We sought to obtain an improved estimate of the timing of migration the Hum→Nea event using the predicted introgressed regions. Initially, we attempted to gain information about timing from the segment lengths. However, we found that there is strong ascertainment bias towards finding longer regions, so that the length distributions are highly overlapping for different migration times (S1 Text). Instead, we turned to the frequency spectrum of introgressed regions, which provides a more robust signal. The older the migration, the more likely that an introgressed region has drifted to high frequency and is shared across the sampled individuals. For the Hum→Nea event, we found that 37% of our regions are inferred as “doubly homozygous” (that is, introgressed across all four Neanderthal lineages). This faction is close to what we observe in regions predicted from our simulations with migration at 250kya (38%), whereas simulations with migration at 150kya and 350kya had substantially different doubly-homozygous rates of 10% and 55%, respectively. To obtain a more precise estimate, we performed additional simulations with values of tmig = 200, 225, 275, and 300kya, and compared the frequency spectrum of introgressed regions after ascertainment using ARGweaver-D. Overall, we find that the divergence time cannot be pinpointed precisely by this method, but it can be fairly confidently bounded at 200kya < tmig < 300kya (S5 Fig). The same approach suggests that tmig > 225kya for the for the Sup→Den event (S6 Fig).