|
Post by Admin on Aug 7, 2020 6:58:57 GMT
Today's humans carry the genes of an ancient, unknown ancestor, left there by hominin species intermingling perhaps a million years ago. The ancestor may have been Homo erectus, but no one knows for sure — the genome of that extinct species of human has never been sequenced, said Adam Siepel, a computational biologist at Cold Spring Harbor Laboratory and one of the authors of a new paper examining the relationships of ancient human ancestors. The new research, published today (Aug. 6) in the journal PLOS Genetics, also finds that ancient humans mated with Neanderthals between 200,000 and 300,000 years ago, well before the more recent, and better-known mixing of the two species occurred, after Homo sapiens migrated in large numbers out of Africa and into Europe 50,000 years ago. Thanks to this ancient mixing event, Neanderthals actually owe between 3% and 7% of their genomes to ancient Homo sapiens, the researchers reported. "Our best conjecture is that an early group of anatomically modern humans left Africa then encountered and interbred with Neandertals, perhaps in the Middle East," Siepel told Live Science. "This lineage [of humans] would then have been lost — either gone extinct, or absorbed by the Neandertals, or migrated back to Africa." Ancient mixers The new research illustrates the complexity of humanity's deep history. Evidence has long been accumulating that humans and Neanderthals mated while their populations overlapped in Europe, before Neanderthals went extinct around 30,000 years ago. In 2010, researchers reported that between 1% and 4% of modern human genes in people in Asia, Europe and Oceania came from Neanderthal ancestors. When you add up all the snippets of Neanderthal DNA present in all modern humans today, some 20% of the Neanderthal genome may be preserved, according to 2014 research. As scientists have been able to sequence more fragile fragments of DNA from fossils of ancient human ancestors, they've discovered a complex web of interbreeding stretching back millennia. Some Pacific Islanders, for example, carry pieces of the DNA of a mysterious ancient species of humans known as Denisovans. The researchers of the new study used a computational method of comparing the genomes of two Neanderthals, a Denisovan and two modern African individuals. (Africans were chosen because modern people in Africa don't carry Neanderthal genes from the well-known human-Neanderthal interbreeding that occurred in Europe starting 50,000 years ago.) This method allowed the researchers to capture recombination events, in which segments of chromosomes — which are made up of DNA — from one individual get incorporated into the chromosomes of another. "We are trying to build a complete model for the evolutionary history of every segment of the genome, jointly across all of the analyzed individuals," Siepel said. "The ancestral recombination graph, as it is known, includes a tree that captures the relationships among all individuals at every position along the genome, and the recombination events that cause those trees to change from one position to the next." One advantage of the method, Siepel said, is that it allows researchers to find recombination events inside of recombination events. For example, if a bit of ancient hominin DNA from an unknown ancestor were incorporated in the Neanderthal genome, and then a later mating event between Neanderthals and humans inserted that mystery DNA into the human genome, the method allows for the identification of this "nested" DNA. The analysis turned up evidence of this sort of nested insertion of DNA. The finding that Homo sapiens seem to have mated with Neanderthals between 200,000 and 300,000 years ago meshes with previous evidence of some sort of mixing event between the two species prior to humans moving en masse to Europe, Siepel said. The researchers also found that 1% of the Denisovan genome hails from the genes of an unknown ancestor, from an interbreeding event that must have happened, roughly, a million years ago. This mystery ancestor could have been Homo erectus, Siepel said, because Homo erectus likely did overlap in Eurasia with the ancestors of Denisovans and Neanderthals. However, these fragments are tiny and there are no Homo erectus sequences to compare them to, so this is speculative. In both cases, these interbreeding events were passed along again to modern humans: 15% of the interbreeding sequences found in Denisovans are present in people living today, the researchers found. The new results are another piece of evidence that ancient and modern human lineages mixed relatively frequently, Siepel said. "A picture is emerging of a series of distinct but related populations moving around the globe and frequently interacting with one another, with occasional interbreeding events that produced hybrid offspring," Siepel said. "These hybrid offspring might in some cases have suffered from reduced fitness — this is an area of controversy — but apparently many of them were healthy enough to survive and reproduce, leaving a patchwork of archaic and modern human DNA in Neanderthals, Denisovans and modern humans."
|
|
|
Post by Admin on Aug 7, 2020 20:14:36 GMT
Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph Melissa J. Hubisz ,Amy L. Williams,Adam Siepel Published: August 6, 2020 doi.org/10.1371/journal.pgen.1008895Abstract The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these “super-archaic” regions—comprising at least about 4Mb—were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today. Author summary We present ARGweaver-D, an extension of the ARGweaver algorithm which can be applied under a user-defined demographic model including population splits and migration events. Given genome sequence data from a collection of individuals across multiple closely related populations or subspecies, ARGweaver-D can infer trees describing the genetic relationships among these individuals at every location along the genome, conditional on the demographic model. Like ARGweaver, ARGweaver-D is a Bayesian method, sampling trees from the posterior distribution in order to account for uncertainty. Using simulations, we show that ARGweaver-D can successfully identify regions introgressed from Neanderthals and Denisovans into modern humans. It is also well-powered to detect introgressed regions stemming from older gene-flow events. We apply ARGweaver-D to the genomes of two Neanderthals, a Denisovan, and two African humans. We identify 3% of the Neanderthal genome which is likely derived from gene flow from ancient humans. We also identify about 1% of the Denisovan genome that may be traced to an unsequenced archaic hominin; 15% of these regions were subsequently passed to modern humans. We find no convincing evidence that selection acted against any of these introgressed regions. Introduction It is now well-established that gene flow occurred among various ancient hominin groups over the past several hundred thousand years. The most well-studied example of archaic gene flow is the interbreeding that occurred when humans migrated out of Africa and came into contact with Neanderthals in Eurasia roughly 50,000 years ago [1, 2]. This event left a genetic legacy in modern humans that persists today; indeed, 1–3% of the DNA of living humans descended from non-African populations, such as Europeans or East Asians, can be traced to Neanderthals [3]. We also now know that an extinct sister group to the Neanderthals, the Denisovans, intermixed with early modern humans in Asia, leaving behind genomic fragments that comprise 2–4% of the DNA of modern Oceanian humans [4–6]. Many other admixture events have been proposed, creating a complex web of ancient hominin interactions across time and space. These events include gene flow between Neanderthals and Denisovans (Nea↔Den) [2, 7]; between Neanderthals and ancient humans who left Africa over 100 thousand years ago(Hum→Nea) [8]; between an unknown diverged or “super-archaic” hominin (possibly Homo erectus) and Denisovans (Sup→Den) [2, 9]; and between other unknown archaic hominins and various human populations in Africa (Sup→Afr) [10–12]. (In the above notation, used throughout this paper, the arrowheads indicate the inferred direction of gene flow, up to the limits of the data and inference method.). As the network of interactions grows more complex, it becomes more difficult to test for gene flow or identify introgressed regions using standard methods [13]. In one prominent example, a positive value has been observed for a “D statistic” based on Neanderthals, Denisovans, African modern humans, and the chimpanzee reference genome [2], indicating an excess of allele sharing between Neanderthals and African humans, as compared with Denisovans and Africans. However, this observation potentially could be explained by gene flow between Neanderthals and Africans, boosting their allele sharing, or from super-archaic hominins into Denisovans, reducing Denisovan/African allele sharing. Notably, the D statistic is highest at sites where the derived allele is fixed or at high-frequency in Africans, implying that many of the excess shared alleles are quite old, and supporting the scenario of super-archaic introgression into Denisovans [2]. At the same time, however, many genomic windows with low Neanderthal-Africa divergence nevertheless have high Neanderthal-Denisovan divergence, which is best explained by Hum→Nea gene flow [8]. In this case, each hypothesis has support from multiple studies [8, 9, 14], suggesting that both the Hum→Nea and Sup→Den events likely occurred. But more generally, it can be difficult to resolve conflicting evidence of this kind using summary statistics alone. Furthermore, even when there is strong evidence for the existence of gene flow, it remains challenging to identify particular introgressed genomic regions. This problem is considerably more difficult for the Sup→Den and Hum→Nea events than for the Nea→Hum or Den→Hum events, both because they are hypothesized to have occured much longer ago [2, 8], causing the introgressed haplotypes to be more broken up by recombination, and because no sequence is available for the super-archaic hominin. The small numbers of sequenced Neanderthal and Denisovan genomes are a further limitation. Current approaches for predicting introgressed regions, including the conditional random field (CRF) [3, 6] and the S* statistic [15, 16] (as well as the variant Sprime [17]), are not ideal for detecting these ancient events, having been optimized for the easier problem of identifying more recent introgression into humans. Furthermore, these methods only use a few summary statistics. When the genomic signal is more subtle, it may be necessary to incorporate all the data using a model-based method. In this paper, we describe a powerful and highly general new method, called ARGweaver-D, that samples ancestral recombination graphs (ARGs) [18–20] conditional on a generic demographic model, including population divergence times, size changes, and migration events. After introducing ARGweaver-D, we present simulation studies showing it can successfully detect Nea→Hum introgression, even when using a limited number of genomes, and that it also has power for older migration events, including Hum→Nea, Sup→Den, and Sup→Afr events. Finally, we apply this method to modern-day Africans and ancient hominins, and characterize both new and previously reported cases of introgression between humans and archaic hominins. Results ARGweaver-D can sample ARGs conditional on an arbitrary demographic model ARGweaver-D is a major extension of ARGweaver [21] that can infer ARGs conditional on a user-defined population model. This model can consist of an arbitrary number of present-day populations that share ancestry in the past, coalescing to a single panmictic population by the most ancestral discrete time point. Population sizes can be specified separately for each time interval in each population. Migration events between populations can also be added; they are assumed to occur instantaneously, with the time and probability defined by the user. Typically, a suitable demographic model for use with ARGweaver-D can be obtained from the literature or by applying a method such as ∂a∂i [22] or G-PhoCS [14] in a preprocessing step. As previously described [21], ARGweaver is a Markov chain Monte Carlo (MCMC) sampler, in which each iteration consists of removing a branch from every local tree in the ARG (“unthreading”), followed by the “threading” step, which resamples the coalescence points for the removed branches. This threading step is the core algorithmic operation in ARGweaver, and is accomplished using a hidden Markov model (HMM), in which the set of states at each site represents all possible coalescence points in the local tree. In ARGweaver (which assumes a single panmictic population), each of these states is defined by a branch and time. However, in ARGweaver-D, each state has a third property, which we call the “population path.” The population path represents the set of populations assigned to the new branch throughout its time span. The modified threading algorithm is illustrated and further described in Fig 1, and additional details are provided in S1 Text. ARGweaver-D is built into the ARGweaver source code, which is available at: github.com/CshlSiepelLab/argweaver. Fig 1. Illustration of the “threading” operation under a model with two populations and a single migration band. Horizontal dashed lines indicate time points for coalescence and recombination. User-specified migration and population divergence times are rounded to the nearest “half time-point”. Migration occurs instantaneously with a user-specified prior rate pM (1% in this work). Here, one haploid lineage has been removed and is being rethreaded (dotted black line), while the other three (solid black lines) are held fixed. Dots on top of each lineage indicate potential coalescence points with the new branch, with black indicating a population path with no migration, and blue indicating a migrant population path. Recombination events (red Xs) occur immediately before positions b2, b3, and b4, with the dotted red line indicating recoalescence of the broken branch. Notice that the newly threaded lineage enters an introgressed state at position b2 and leaves it at b4. After running ARGweaver-D, it is straightforward to identify predicted introgressed regions; they are encoded in each sampled ARG as lineages that follow a migration band. By examining the set of ARGs produced by the MCMC sampler, ARGweaver-D can compute posterior probabilities of introgression across the genome. As will be seen below, this computation can be done in a variety of ways—for example, as overall probabilities of migration anywhere in the tree, or probabilities of a specific sampled genome having an ancestral lineage that passes through a particular migration band. In addition, for a diploid individual, probabilities of heterozygous or homozygous introgression can be separately computed. Throughout this paper, we use a threshold of p ≥ 0.5 to define predicted introgressed regions, and compute total rates of called introgression for a diploid individual as an average across each haploid lineage.
|
|
|
Post by Admin on Aug 7, 2020 22:27:28 GMT
ARGweaver-D can accurately identify archaic introgression in modern humans We first performed a set of simulations to assess the power and accuracy of ARGweaver-D in identifying Neanderthal introgression into modern humans. These simulations realistically mimic human and archaic demography, as well as variation in mutation and recombination rates (see Methods). We compared the performance with the CRF algorithm [3]; Fig 2 summarizes the results. Overall, ARGweaver-D has improved performance over the CRF, with improvements being subtle for long segments but becoming more pronounced for shorter segments. This gain in power occurs despite the fact that the CRF used a much larger panel of African samples than was used by ARGweaver-D. (CRF used 43 African individuals, but ARGweaver-D used only 2 to save computational cost; both methods used 2 diploid Neanderthals). Fig 2. Performance on Nea→Hum simulations. A: Receiver operating characteristic (ROC) curves showing basewise performance of ARGweaver-D (red) and the CRF (blue) on simulated data. The two methods predicted introgression in the same simulated European individuals, but the CRF made use of the full reference panel (43 diploid Africans), whereas ARGweaver-D only used only two diploid Africans. Different line patterns correspond to different maximum segment lengths. B: Length distributions of real and predicted introgressed regions for data in panel A. Next, we predicted introgressed regions in two non-African human samples from the Simons Genome Diversity Panel (SGDP), a European (Basque) and a Papuan. The ARGweaver-D model used is illustrated in Fig 3; but only the “Recent migration” bands were included. We compared to calls from the CRF method, although it is important to note that the two methods were run with different data sets: ARGweaver-D again used many fewer African individuals (2) than CRF (43), but in this case ARGweaver-D used both the Altai and Vindija Neanderthal, whereas the CRF results were obtained with only the Altai. Because the Vindija Neanderthal is a better proxy for the introgressing Neanderthal, ARGweaver-D likely has better power to detect Neanderthal introgression in this comparison. The results are summarized in Fig 4. Overall, the two methods identify many overlapping regions, but each method also produces a substantial fraction not called by the other method (between 15-40%). Both methods show a strong depletion of introgression on the X chromosome, especially in the Basque individual. Fig 3. Population model assumed for inference using ARGweaver-D. Population sizes (constant per branch) are shown in parentheses. The model is invariant to the population sizes of the single-lineage chimpanzee and super-archaic hominin branches. Migration events are shown by arrows between populations; solid arrows are used for previously proposed events and dashed arrows for new events. All parameters except tmig and tdiv are held constant at the specified values. Fig 4. Average nucleotide coverage of predicted introgressed regions in four modern human individuals. Colors indicate ARGweaver-D predictions and stripes indicate CRF predictions; colors and stripes together indicate regions called by both methods. The CRF calls were only produced for non-African individuals, so for Mandenka and San, only ARGweaver-D results are shown. Genome sequences were from the Simons Genome Diversity Panel (SGDP). Fig 4 highlights that more Neanderthal than Denisovan sequence is detected in the Papuan, despite that Papuans are expected to have a higher level of introgression from Denisovans compared to Neanderthal [23]. This observation can be explained by lower power to detect Denisovan introgression, due to the different levels of divergence between introgressing archaic individuals compared to sequenced archaic individuals; previous literature has shown that the sequenced Denisovan is highly diverged from the introgressing Denisovan [2], and that the Vindija Neanderthal is more closely related to the introgressing Neanderthal than the Altai Neanderthal [9]. In fact, this information is embedded in the ARGs and is reflected in the coalescence times between humans and archaic individuals in regions where these humans are introgressed. For example, the average coalescence time for introgressed lineages between Vindija and Papuan is 262kya; for Altai and Papuan is 326kya, and for Deniosvan and Papuan is 396ka. For the Basque individual, we also see a smaller average coalescence time with the Vindija (236kya) than the Altai (292kya). Notably, ARGweaver-D calls nearly 0.5% introgression from the Neanderthal into each of the African individuals. These calls are likely explained by a combination of false positives and back-migration into Africa from Europe. However, another possibility is that some regions introgressed into Neanderthals from ancient humans [8] may be assigned the wrong direction by ARGweaver-D. With few samples, it can be difficult to determine the direction of migration between two sister populations. Indeed, when we simulate migration in both directions, but perform inference in ARGweaver-D using only a Nea→Hum migration band, we find that ∼8% of Hum→Nea bases are identified as Nea→Hum (See S1 Text). This difficulty in resolving directionality is our primary motivation for excluding non-African samples in our later analysis of older migration events (see next section).
|
|
|
Post by Admin on Aug 8, 2020 6:26:00 GMT
ARGweaver-D can detect older introgression events We next carried out a series of simulations to assess ARGweaver-D’s power to detect more ancient introgression events. For this purpose, we simulated the modern human samples using a model of African human population history, and as such did not include the migration from Neanderthals or Denisovans into non-African humans. These simulations included three migration events: one from modern humans into Neanderthals (Hum→Nea), one from a “super-archaic” unsampled hominin into Denisovans (Sup→Den), and one from the super-archaic hominin into Africans (Sup→Afr). (Note that although both the Sup→Afr and Sup→Den events are simulated from the same super-archaic population, they are meant to represent introgression from any unsampled, diverged hominin population, not necessarily the same one.) These simulations included many realistic features: ancient sampling dates for the archaic hominins, variation in mutation and recombination rates, randomized phase, and levels of missing data modeled after the SGDP and ancient genomes that we use for analysis (see Methods). Each set of simulations contained all three types of migration, and ARGweaver-D was applied with multiple migration bands, with the goal of detecting all migration events in a single run. We analyzed these data sets with ARGweaver-D using the model depicted in Fig 3, but including only the “old migration” bands. As we do not have good prior estimates for the migration time (tmig) or super-archaic divergence time (tdiv), we tried four values of tmig (50kya, 150kya, 250kya, 350kya) and two values of tdiv (1Mya, 1.5Mya). We generated data sets under all 8 combinations of tmig and tdiv, and then analyzed each data set with ARGweaver-D under all 8 models, in order to assess the effects of model misspecification on the inference. We find that the power to detect super-archaic introgression is clearly higher when the divergence is higher (tdiv = 1.5Mya), but, as expected, the choice of tdiv does not affect the power to detect Hum→Nea introgression (Fig 5). In addition, for all events, we find that power decreases as the true migration time increases (from top to bottom in Fig 5). Fig 5. Simulation results. Each panel represents a set of simulations generated with a different value of tmig (rows) and tdiv (columns). Within each panel, each bar gives the basewise true positive rate for a particular migration event, using a posterior probability threshold of 0.5. The color of each bar represents the value of tmig assumed for the inference model (orange = 50kya, red = 150kya, purple = 250kya, blue = 350kya). Shaded bars represent an assumption of tdiv = 1.5Mya for inference, whereas solid bars represent tdiv = 1.0Mya. Because the archaic hominin fossils are older than 50kya, results for tmig = 50kya (top) are only applicable for introgression into humans. Looking at each group of bars in Fig 5 shows the results on the same simulated data, using different parameters in the ARGweaver-D model. The blue and purple bars tend to be higher, showing that power is often better when an older migration time is used in the model, even when the true migration time is recent. Similarly, power is often better for detecting super-archaic introgression when tdiv is set to 1Mya (solid), rather than 1.5Mya (striped), in the ARGweaver-D model. Overall, ARGweaver-D has reasonably good power to detect super-archaic introgression when the divergence time is old, but power is more limited as the divergence time decreases. The power to detect Sup→Afr is always lower than the power to detect Sup→Den, as the African population size is much larger, making introgression more difficult to distinguish from incomplete lineage sorting. For the Hum→Nea event, we have around 50% power if the migration time is 150kya, and around 30% power when it is 250kya. False positive rates were < 1% at a posterior probability threshold of 0.5 (S1 Fig). In addition, two additional migration bands in the ARGweaver-D model served as controls for false positive predictions: one from the super-archaic population to Neanderthal (Sup→Nea), and another from humans into Denisova (Hum→Den). Events in these bands were also called at < 1% for all models. In addition, the rate of mis-classification of migration type was very low for all event types (S2 Fig). In particular, the model can easily distinguish between Hum→Nea and Sup→Den events, despite that both produce similar D statistics [8, 9]. Notably, the simulated data sets were generated with a human recombination map, but the ARGweaver-D model assumed a simple constant recombination rate (S1 Text). We observe somewhat better performance when ARGweaver-D uses the true recombination map, but it is unrealistic to assume the true map is known for the archaic hominins. In addition, we find that the performance of the method does not improve as more African samples are added, so we focus here on an analysis with two African samples only (four haploid genomes). In a separate simulation study, we find that the method is reasonably robust to errors in the assumed population size, although the false positive rate does increase if the population size of the population receiving migration is underestimated by more than 20%, with FP rates approaching 5% for Hum→Nea when the assumed Nenderthal population size is 25% of the true value. Further details of these simulation studies are provided in S1 Text. Deep introgression results Having demonstrated reasonable power and accuracy in a simulation setting, we turned to an analysis of real modern and archaic human genomes. Our goals for this study were to identify and characterize introgressed regions from previously proposed migration events, as well as to look for evidence for new migration events, perhaps not detectable by other methods. Our data set consisted of two Africans from the SGDP [24], two Neanderthals [2, 9], the Denisovan [4], and a chimpanzee outgroup. For inference, we again assumed the demography illustrated in Fig 3, considering the old migration events only. We focus here on the model with tmig = 250kya and tdiv = 1Mya, because this model seemed to result in high power in all our simulation scenarios, and because our results suggest that it may be the most realistic (as discussed below). The results using other models are consistent with those presented here (see S1 Text). Overall, we find that Hum→Nea regions are called most frequently, at a rate of ∼3% in both the Altai and Vindija Neanderthal (Fig 6; see also S3 Fig). This number is almost certainly an underestimate, given that the true positive rate for this model was estimated at 30–55%. By contrast, only ∼0.37% of regions are classified as Hum→Den. As no previous study has found evidence for Hum→Den migration, this migration band serves as a control, supporting our false positive rate estimate of 0.41% from simulations. Fig 6. Genome-wide coverage of predicted ancient introgression. Each bar shows total average coverage for a haploid genome, with darker shading (at bottom) representing homozygous calls. Solid bars are for autosomes, and striped bars for chromosome X. Predictions were based on a posterior probability cutoff of 0.5. As noted, there is a well-known depletion on the X chromosome of archaic introgression into humans. By contrast, we observe high coverage of Hum→Nea introgression on the X chromosome for both the Altai and Vindija samples. Indeed, the coverage is somewhat higher on the X chromosome than the autosomes. However, this difference is likely due in large part to increased power on the X; simulations suggest that power will be ∼20% higher for this event when effective population sizes are multiplied by 0.75 (S4 Fig). Nevertheless, we observe considerable variation in detected introgression across the chromosomes, and several autosomal chromosomes have higher predicted coverage than the X, including chromosomes 1, 6, 21, and 22 (S3 Fig). Although the Vindija sample is younger by 70kya than the Altai sample [9], it shows no depletion of human ancestry on the autosomes, suggesting that negative selection did not cause a significant loss of human introgressed regions during that interval. However, some individual chromosomes do show decreases in coverage from Altai to Vindija, with the largest drop on the X chromosome (S3 Fig). Other migration events are detected at lower levels. We identify 1% of the Denisovan genome as introgressed from a super-archaic hominin—roughly double the estimated false positive rate (0.49%) for this event. Our apparent weak power for these events (another group has estimated ∼6% introgression [9]) suggests that the super-archaic divergence may have been somewhat recent (perhaps closer to 1Mya than 1.5Mya). Still, this analysis resulted in 27Mb of sequence that may represent a partial genome sequence from a previously unsequenced archaic hominin. In addition, ARGweaver-D predicted that a small fraction of the Neanderthal genomes is introgressed from a super-archaic hominin (0.75% for Altai and 0.70% for Vindija), an event that has not been previously hypothesized. However, these fractions only slightly exceed the estimated false positive rate (0.65%), so these results are likely dominated by spurious predictions. The Sup→Den events (and perhaps Sup→Nea events) raise the possibility that super-archaic-derived sequences could have been passed, in turn, to modern humans through subsequent Den→Hum (or Nea→Hum) migration events. To explore this possibility, we intersected the predicted regions with introgression predictions in modern humans across the full SGDP data set (details in S1 Text). We found that most Sup→Den and Sup→Nea regions have higher-than-expected divergence to the Denisovans and Neanderthals (respectively) across all humans, and not just the two African humans analyzed by ARGweaver-D. In addition, 15% of the Sup→Den regions overlap with sequence introgressed into Asian and Oceanian individuals from Denisovans, and many of these regions also contain a high number of variants consistent with super-archaic introgression. We also observe that 35% of the Sup→Nea regions are introgressed in at least one modern-day non-African human. Notably, one region of hg19 (chr6:8450001-8563749) appears to be Neanderthal-introgressed and also overlaps a Sup→Nea region. A complete list of Sup→Den and Sup→Nea regions that overlap human introgressed regions, and the genes that fall in these regions, is available in S1 and S2 Tables. We sought to obtain an improved estimate of the timing of migration the Hum→Nea event using the predicted introgressed regions. Initially, we attempted to gain information about timing from the segment lengths. However, we found that there is strong ascertainment bias towards finding longer regions, so that the length distributions are highly overlapping for different migration times (S1 Text). Instead, we turned to the frequency spectrum of introgressed regions, which provides a more robust signal. The older the migration, the more likely that an introgressed region has drifted to high frequency and is shared across the sampled individuals. For the Hum→Nea event, we found that 37% of our regions are inferred as “doubly homozygous” (that is, introgressed across all four Neanderthal lineages). This faction is close to what we observe in regions predicted from our simulations with migration at 250kya (38%), whereas simulations with migration at 150kya and 350kya had substantially different doubly-homozygous rates of 10% and 55%, respectively. To obtain a more precise estimate, we performed additional simulations with values of tmig = 200, 225, 275, and 300kya, and compared the frequency spectrum of introgressed regions after ascertainment using ARGweaver-D. Overall, we find that the divergence time cannot be pinpointed precisely by this method, but it can be fairly confidently bounded at 200kya < tmig < 300kya (S5 Fig). The same approach suggests that tmig > 225kya for the for the Sup→Den event (S6 Fig).
|
|
|
Post by Admin on Aug 8, 2020 19:24:58 GMT
Data release and browser tracks. Our predictions and posterior probabilities can be viewed as a track hub on the UCSC Genome Browser [25], using the URL: compgen.cshl.edu/ARGweaver/introgressionHub/hub.txt. The raw results can be found in the sub-directory: compgen.cshl.edu/ARGweaver/introgressionHub/files. Fig 7 shows a large region of chromosome X as viewed on the browser, with a set of tracks showing called regions, and another showing posterior probabilities. Fig 8 shows a zoomed-in region with a Sup→Den prediction, and S7 Fig shows an example Hum→Nea region. When zoomed in, there is a track showing the patterns of variation in all the individuals used for analysis, with haplotype phasing sampled from ARGweaver-D. Fig 7. Introgression results for a ∼40Mb region of chromosome X displayed in the UCSC Genome browser. The tracks at top show predicted introgressed regions, with green indicating introgression from humans, and gray indicating introgression from a super-archaic hominin. Darker colors are used for homozygous introgression. The tracks below indicate the posterior probabilities for each type of introgression into each individual. Fig 8. A region with predicted heterozygous Sup→Den introgression. A ∼170kb region on human chromosome 1 is shown in the UCSC Genome Browser. The first two tracks show the predicted regions and posterior probabilties as in Fig 7, except only the supToDen probabilities are shown. The next track shows the variants observed in this region that are used in the ARGweaver-D analysis, with alternating colors indicating variant alleles. Notice that, outside of the introgressed region, the Denisovan is generally homozygous and shares variants with Africans and Neanderthals; but within the region, the Denisova_2 haplotype has many singleton variants, whereas Denisova_1 continues to share many variants with Neanderthals and Africans. When chimpanzee alignments are available, the non-chimp allele is colored; otherwise the minor allele is colored. No color indicates that a haplotype has the chimpanzee or major allele, or missing data. The final phasing sampled by the ARGweaver-D algorithm was used. Functional analysis of introgressed regions. Because some of our observations suggested an absence of selection against the Hum→Nea regions, we searched for other signals that might hint at possible functional consequences of this introgression event. We first looked at four 10Mb introgression “deserts,” where the rate of both Nea→Hum and Den→Hum introgression is < 1/1000 [6]. We observed fairly high coverage of Hum→Nea introgression in these deserts (Table 1), suggesting that the bias against introgression is unidirectional. For two of the deserts, the Hum→Nea coverage is quite high, especially in the Altai Neanderthal. Notably, the third region overlaps the FOXP2 gene, which contains two human-chimp substitutions that have been implicated in human speech [26, 27], although the Hum→Nea introgressed region is upstream of these substitutions (S8 Fig). We further examined a broader collection of deserts of Nea→Hum ancestry for evidence of depletion for introgression in the opposite direction. Based on the CRF regions, we identified 30 regions of at least 10Mb that meet the criteria for “deserts”. However, by several measures—including coverage of Hum→Nea, number of elements, and change in coverage between the Altai and Vindija Neanderthals—these deserts are not significantly different from a set of randomly chosen genomic regions matched for size (S9 Fig). Finally, we checked for enrichments or depletions of various functional elements in our introgressed segments, relative to the expectation under a random distribution across the genome. The interpretation of these numbers is difficult, as local genomic factors (such as effective population size, mutation and recombination rates) significantly affect ARGweaver-D’s power to detect introgressed regions. Nevertheless, we find that the enrichment of functional regions (such as CDSs, promoters, and UTRs) tends to be higher in the Altai than the Vindija Neanderthal, which is the opposite pattern expected from negative selection (since the Vindija Neanderthal’s fossil is much more recent). Further enrichment results are detailed in S1 Text.
|
|