|
Post by Admin on Aug 4, 2020 19:18:35 GMT
RESULTS To illustrate the evolutionary scenarios we investigate here, consider the simple model of recent hominin demography presented in Figure 1A. Many alleles segregating in ancestral hominins were lost in the AMH lineage after the divergence of the ancestors of AMHs and Neanderthals. Some were lost in all AMHs, while others were lost only in Eurasian populations, e.g., during the OOA bottleneck. These lost alleles thus had the potential to be reintroduced into Eurasian populations via archaic admixture. Within these populations, reintroduced alleles would initially only be present on introgressed Neanderthal haplotypes, and over time many would retain high LD with Neanderthal-derived alleles in modern Eurasians. In the following analysis, we will refer to alleles that were present in the most recent common ancestor of AMHs and Neanderthals as “ancestral hominin alleles.” Figure 1. Schematic of the reintroduction of lost ancestral alleles by Neanderthal introgression. (A) Illustration of the evolutionary trajectory and resulting genomic signature of an allele A (blue) that was: (1) segregating in the ancestors of anatomically modern humans (AMHs) and Neanderthals, (2) lost to the ancestors of Eurasians in the human out of Africa (OOA) bottleneck or other subsequent migrations, and (3) reintroduced to Eurasians through Neanderthal admixture. Consequently, reintroduced alleles (RAs) are expected to be in high linkage disequilibrium with some Neanderthal-derived alleles (NDAs; orange) on introgressed haplotypes (gray) in modern Eurasians. (B) Schematics of the different evolutionary histories of interest in this paper. Alleles lost in Eurasians (or all AMHs) and reintroduced by Neanderthal introgression are referred to as RAs. Alleles that appeared in the Neanderthal lineage, were not present in the ancestors of humans and Neanderthals, and only exist on introgressed haplotypes in modern humans are referred to as Neanderthal-derived alleles (NDAs). We will refer to ancestral hominin alleles that are only observed in Eurasians on introgressed Neanderthal haplotypes as reintroduced alleles (RAs), and introgressed alleles that first appeared on the Neanderthal lineage as Neanderthal-derived alleles (NDAs) (Figure 1B). Here we evaluate the presence and function of RAs in modern Eurasians and contrast them with NDAs. Hundreds of thousands of RAs exist in modern Eurasian populations To identify candidate RAs in the genomes of modern Eurasians, we sought variants in 1000 Genomes Phase 3 Eurasian populations that are present only on introgressed haplotypes (Figure S3, Methods). We began with sets of previously identified SNPs that tag introgressed haplotypes in Eurasians. These SNPs were identified by S* and comparisons between Neanderthal genomes and the genomes of European (EUR), East Asian (EAS), and South Asian (SAS) populations (12). For each population, we identified candidate RAs by collecting variants that are in perfect LD (r2=1) with a Neanderthal tag SNP, but that are not tag SNPs themselves. We then evaluated each of these candidate RAs with regard to its ancestral status and presence in modern sub-Saharan Africans. Candidate alleles that matched the high-confidence ancestral allele call from 1000 Genomes or that were present at a frequency of >1% in sub-Saharan African populations without substantial Neanderthal ancestry were deemed RAs. Overall, ∼73% of Neanderthal tag SNPs are in perfect LD with at least one classifiable RA. Forward-time evolutionary simulations suggest that false positives due to recombination artefacts or convergent mutations, even at highly mutable CpG dinucleotides, are rare (Figure S1, S2; Table S1). Some Neanderthal haplotypes are present in some sub-Saharan African populations due to backflow post-admixture from Eurasians; however, given that the resulting fraction of Neanderthal ancestry is estimated to be very low (0.18%) (6, 29), our classification criteria prevent them from leading to many false positives (Methods). Finally, our approach is likely conservative, because many true RAs are not expected to retain perfect LD with any NDA. Figure S1 Demographic model used for evolutionary simulations. The demographic model used to simulate human–Neanderthal admixture and quantify the reintroduction of lost alleles. The model and effective population sizes (Ne) were based on previous simulations of Neanderthal admixture. We considered models in which mutations incurred a fitness cost (mildly purifying selection) or no fitness cost (strict neutrality). Two different admixture fractions (f=0.02 and f=0.04) and three mutation rates were used in the simulations (Methods). Figure S2 Simulations indicate that false positives in RA identification due to independent convergent mutations are rare. For each simulated population, we identified all NDAs that occurred in positions with ancestral hominin variation that was lost in the Eurasian OOA. (A) Boxplots summarize the frequencies of these potentially confounding NDAs among all sites that would be called as RAs at the time of admixture (c.f. Figure 1). The incidence of these confounding mutations is slightly higher under a purely neutral model (left) than under a model where new mutations can be deleterious (right). (B) Comparison of the effect of elevated mutation rates on the incidence of potentially confounding variants. Under a neutral model, the false positive rate scales with the mutation rate. The highest rate (μ= 7e-7) provides an estimate for CpG sites and results in a 3% false positive rate. Each boxplot represents 100 simulated populations.
|
|
|
Post by Admin on Aug 5, 2020 7:47:46 GMT
Altogether, we identified 209,176 RAs (Figure 2B, Figure S3). The South Asian and East Asian populations each have more RAs (139,270 and 125,257, respectively) than the European populations (90,121). These numbers likely reflect the differences in the number of Neanderthal tag SNPs found in each population (Figure S3, Figure S4), the greater levels of Neanderthal ancestry previously observed in East Asians (30, 31), and the differences in the demographic history of these populations (32). Figure S3 Introgressed allele class assignment decision tree and allele count summary. Decision tree by which 1000 Genomes variants in perfect LD with Neanderthal tag SNPs were classified as RAs and NDAs. The counts of variants making it to each of the numbered steps (1-5) is summarized in the lower table. Figure S4 Introgressed allele sharing across three Eurasian populations. Venn diagram showing the fractions of each introgressed variant class that are shared between populations. Figure 2. Neanderthal introgression reintroduced thousands of lost ancestral alleles to Eurasian populations. The number of RAs and NDAs in each Eurasian 1000 Genomes population (EAS = East Asian; EUR = European ancestry; SAS = South Asian) identified by our pipeline (Figure S3; Methods). Overall, Neanderthal admixture is responsible for the presence of over 200,000 ancestral alleles lost in the human OOA bottleneck or later migrations into the ancestors of Eurasian populations. For the majority of RAs, the reintroduced allele is still segregating in African populations; however, a substantial fraction of RAs (EAS: 22%, EUR: 30%, and SAS: 28%) are present in modern human populations exclusively on haplotypes of Neanderthal ancestry (i.e., these alleles are no longer present in African populations). This suggests that the derived allele likely became fixed at these positions in AMH populations before the reintroduction of the ancestral allele via Neanderthal admixture. For those RAs where the corresponding allele is still present in Africans, they are segregating at significantly higher frequencies in Eurasians than those RAs no longer observed in Africans (Figure S6). This suggests heterogeneity in the selective pressures on RAs. Figure S5 Simulations indicate that reintroduction of alleles lost in the OOA bottleneck by Neanderthal introgression was common. The ratios of RAs to NDAs over 100 simulated Eurasian populations. The simulations predict approximately one RA for every two NDAs, and these estimates are robust to changes in the simulated Neanderthal admixture fraction. Misclassification of non-RAs as RAs due to independent, convergent mutations is extremely rare (Figure S2) and the overall false discovery rate for LD-based RA identification is below ∼1% (Table S2). While these forward time simulations only approximate the demographic histories of these populations, the observed RA-to-NDAs ratio are qualitatively consistent with the simulations (Figure 2).
|
|
|
Post by Admin on Aug 5, 2020 20:54:27 GMT
Figure S6 Comparison of allele frequencies across three Eurasian populations stratified by presence/absence of allele in modern sub-Saharan African populations. Reintroduced Ancestral Alleles (RAAs) that are also present in modern African (AFR) populations segregate at higher allele frequencies (AF) in all Eurasian populations than RAAs for which the allele is absent in AFR. Intra-population median differences in AF are displayed along with P-values (Mann Whitney U test). Outliers are not shown. Circles indicate mean AF. Next, we examined the distribution of RAs across introgressed haplotypes. RAs are pervasive; 84.4% (EAS), 81.8% (EUR), and 81.7% (SAS) of introgressed haplotypes contain RAs. The average number of RAs per introgressed haplotype is ∼17. (Figure S7A). Of the haplotypes containing RAs, 21.3% (EAS), 11.8% (EUR), and 15.2% (SAS) contain more RAs than NDAs (Figure S7B). RAs also have greater variability in their distribution across haplotypes, and appear more clustered within haplotypes than NDAs (Figure S7C). Thus, RAs are present on most introgressed haplotypes and, in some cases, constitute the majority of introgressed variants in these regions. Figure S7 Reintroduced alleles cluster within introgressed Neanderthal haplotypes. (A) Scatter plot of the numbers of RAs and NDAs contained on all introgressed haplotypes in EUR. The correlation between the NDA and RA content is moderate (Pearson’s r2=0.46), with 18% of the haplotypes containing no RAs and 10% having more RAs than NDAs. (B) Scatter plot of the number of introgressed variants on each haplotype vs. haplotype length. The NDA content of a haplotype is proportional to its length (r2 = 0.85), but the number of RAs in each haplotype is less strongly correlated with length (r2 = 0.56). (C) Heatmap of the fraction of NDAs and RAs in density percentiles (high to low, left to right) averaged over all introgressed Eurasian haplotypes. This information is summarized in a cumulative density function (CDF) above the heatmaps. A higher fraction of all RAs are found in the most dense percentiles; this reflects the fact that RAs are often present in more dense clusters than are NDAs. RA-containing introgressed haplotypes are associated with anthropometric human traits and disease risk To update knowledge of human phenotypes influenced by Neanderthal introgression, we intersected all RAs and NDAs from each of the three Eurasian populations with significant associations (P < 10−8; Methods) reported in the GWAS Catalog as of January 24, 2019 (33). Sixty-eight percent of NDAs with at least one significant GWAS association are in perfect LD with at least one RA (File S2). The consequence of this is that over 70% of the phenotype associations with NDAs have an equally strong association with at least one RA. Thus, while previous studies have used GWAS to link variants on introgressed haplotypes with phenotypes (5, 6, 9), many associations could be mediated by RAs. The high LD between RAs and NDAs prevents the identification of the RAs, the associated NDAs, or other variants as causal. However, in Europeans, we found that nearly as many RAs (n = 1049) as NDAs (n = 1349) are significantly associated with at least one trait. Overall, Eurasian RAs tagged 2197 unique, significant associations while NDAs tagged 2547 (File S2). Many of the phenotypes tagged by RAs are morphometric (e.g., cranial base width, BMI, and height), and several others relate to more general aspects of outward appearance (e.g., chin dimples, male-pattern baldness, and skin pigmentation). Introgressed RAs are also associated with many pathologies, including cancers (breast, esophageal, lung, prostate), Alzheimer’s disease, and neurological conditions like neuroticism and bipolar disorder (File S2). Several of the RAs that are no longer present within sub-Saharan African populations also have associations with traits. These RAs are particularly interesting, because they likely represent loci at which derived alleles became fixed in modern human populations after the split from ancestors of Neanderthals. For example, an RA (rs11564258) near MUC19, a gel-forming mucin expressed in epithelial tissues with a potential role in interaction with microbial communities, is strongly associated with both Crohn’s disease and inflammatory bowel disease (34, 35). This locus has been identified in scans for potential adaptive introgression (18). We also find associations with facial morphology, body mass index, sleep phenotypes, and metabolite levels in smokers (36–40). While none of these findings suggest that RAs are more likely than NDAs to be “causal”, they significantly expand the number of candidate variants in introgressed regions. Introgressed haplotypes containing eQTL have higher RA fraction than non-eQTL introgressed haplotypes We next evaluated the prevalence of RAs among eQTL in 48 tissues profiled in v7 of the Genotype-Tissue Expression (GTEx) project (41). Introgressed eQTL are found in all GTEx tissues, with 18% of EUR RAs (16,318) and 16% of EUR NDAs (31,822) being eQTLs in at least one tissue. While each RA is associated with at least one NDA, the number of RAs and NDAs on an introgressed haplotype is not perfectly correlated (Pearson r2 = 0.46; Figure S7A). The 1585 introgressed haplotypes containing at least one introgressed eQTL have a significantly higher fraction of RAs than the 4237 haplotypes having no eQTL (median of 0.20 vs. 0.24, P = 3×10−13, Mann-Whitney U Test; Figure 3A). This result also holds when stratifying introgressed haplotypes with eQTL by their tissue of activity (Figure S14).
|
|
|
Post by Admin on Aug 6, 2020 6:53:36 GMT
Figure 3. Reintroduced alleles are common among introgressed eQTLs. (A) The fraction of RAs among introgressed alleles on Neanderthal haplotypes in Europeans (EUR) that either contain or are lack GTEx eQTL. Introgressed haplotypes with eQTL have significantly higher RA fraction (median of 0.24 vs. 0.20, P = 3.00e–13, Mann-Whitney U test). (B) The RA:NDA ratio among eQTLs in each of 48 GTEx v7 tissues tissue. Bubbles are scaled by the number of RA eQTLs in each tissue. Compared to the genome wide average (RA:NDA ratio = 0.47; indicated by vertical black line), 13 tissues show more than the expected number of RA eQTL and 5 tissues show fewer than the expected number (P < 0.01, hypergeometric test after Bonferroni correction). Among introgressed variants that are also eQTL, the ratios of RAs to NDAs varied across tissues, with 13 tissues having a higher RA:NDA ratio than expected from the RA:NDA ratio in the genome as a whole (Figure 3B). Brain tissues are 7 of the 13 tissues enriched for RA eQTLs, having RA:NDA ratios of between 0.53–0.83 compared to the overall observed ratio of 0.47 (P < 0.01, hypergeometric test with Bonferroni correction). Introgressed haplotypes have been previously shown to modulate gene regulation, especially in the brain (24, 42), and the higher-than-expected presence of RAs in more than half of these tissues could suggest shared regulatory architectures. RA eQTLs also appear more abundant in the pituitary gland, pancreas, adrenal gland, testes, and tibial nerve. RA eQTL are less abundant than expected in the introgressed eQTL from mucosal tissues and salivary gland. In summary, introgressed haplotypes containing eQTL contain a higher fraction of RAs, and this set of RA eQTLs is not evenly distributed among tissues. Some RAs have conserved gene regulatory associations in European and African populations Many introgressed haplotypes influence gene regulation, and the majority of them contain RAs (42, 43). Given the high LD between RAs and NDAs, it is challenging to determine from genetic association data alone whether a particular RA or NDA is functional. Indeed, we find that RAs and NDAs are similarly likely to overlap known regulatory motifs (Figure S8). Thus, to search for RAs likely to have regulatory functions independent of associated NDAs, we analyzed cross-population eQTL data from lymphoblastoid cell lines (LCLs) from European (EUR) and sub-Saharan African Yoruba (YRI) individuals (44). We sought eQTL alleles that are RAs in Europeans and are present in Yoruba in non-Neanderthal introgressed regions (Figure 4A; Methods). Thus, if an allele that was reintroduced into Eurasians has similar effects on gene expression in both populations, it suggests that that the RA (rather than associated NDAs) influences expression, and that introgression reintroduced ancestral regulatory function. Figure S8 RAs and NDAs have similar amounts of overlap with annotated regulatory elements. Comparison of the fraction of NDAs and RAs in each of the RegulomeDB functional classes in order of evidence of regulatory activity. Figure 4. Reintroduced alleles restore regulatory functions lost in Eurasians. (A) Conceptual model of restored regulatory function resulting from Neanderthal admixture. Here, allele A is a cis-acting regulatory variant that is exclusively found on introgressed haplotypes (gray) in modern Europeans (EUR). Allele A is also present in sub-Saharan Yoruba individuals (YRI) lacking Neanderthal ancestry. It displays similar cis-regulatory activity in both populations. This pattern suggests that allele A is an RA in Europeans and that it influences gene regulation independent of the associated NDAs. (B) Two examples of genes (SDSL and HDHD5) with consistent expression differences (measured in RPKM) associated with RA eQTLs in EUR and the corresponding allele in YRI LCLs. The RAs are present only on introgressed haplotypes in EUR, and the NDAs associated with the RAs are not present in YRI. This suggests that these RAs restore lost gene regulatory functions in Europeans. (C) Schematic of the HDHD5 locus highlighting the locations of one NDA (orange) and four RA eQTLs (blue) in the introgressed haplotype and the different combinations of these alleles present in EUR, YRI, and Neanderthals. (D) Luciferase activity driven by constructs carrying different combinations of alleles present in the HDHD5 locus. We assayed four constructs containing: 1) no introgressed alleles, 2) only the NDA, 3) only the RAs, and 4) all introgressed variants. Results are summarized over three replicates. As expected from the eQTL data, constructs lacking RAs drive significantly stronger expression (∼2x baseline) than constructs containing RAs (∼1x baseline; two-tailed t-test, P < 0.01 (**) and P < 0.001 (***)). The regulatory effect of the RAs is independent of the presence the NDA found in introgressed EUR haplotypes. (E) Regulatory activity in a massively parallel reporter assay (MPRA) for the four HDHD5 RA eQTLs reveals that rs71312076 has significant (P < 0.007) regulatory effects when placed in the non-introgressed European background sequence. In the LCL eQTL data, 2,564 RAs were significant eQTL in EUR, and only 180 were significant eQTL in YRI. This difference is largely due to the much lower power in YRI (sample size of 89 vs. 379) which, in combination with having cross-population data from only one cellular context, makes it challenging to estimate the full extent to which RAs contribute regulatory function. Nevertheless, of the 180 YRI eQTL corresponding to EUR RAs, 42 displayed significant eQTL effects in both populations. These RA eQTLs influence the expression of nine genes (Table S3). The expression differences observed for the RAs in EUR have the same direction of effect and similar magnitude as those observed for the corresponding allele in YRI. For example, two genes, SDSL and HDHD5, each have four cross-population RA eQTLs that have similar effects on gene expression in both EUR and YRI (Figure 4B). Thus, despite the limitations of the cross-population eQTL data, these results suggest that some RAs influence gene regulation in Eurasian individuals.
|
|
|
Post by Admin on Aug 6, 2020 21:11:40 GMT
RAs can influence expression independent of NDAs To determine whether RAs directly influence expression in EUR individuals independently of linked NDAs, we functionally dissected the regulatory activity of four cross-population RA eQTL. These alleles associate with the expression of HDHD5 (also known as CECR5), a hydrolase domain containing protein that is expressed in diverse tissues. It is located in a region of chromosome 22 associated with Cat Eye Syndrome (CES), a rare disease associated with chromosomal abnormalities in 22q11 with highly variable clinical presentation that often includes multiple malformations affecting the eyes, ears, anus, heart, and kidneys (45). The HDHD5 locus contains a 2 kb region, which in introgressed Europeans carries an NDA that is in perfect LD with four RAs that are cross-population eQTLs for HDHD5 (Figure 4C).
We performed luciferase reporter assays in LCLs using four different combinations of the NDA and RAs within this 2 kb region (Figure 4D, Table S5). In each assay, we compared the activity of each combination of alleles to the activity driven by a vector with a minimal promoter but no insert. The luciferase activity driven by a reporter construct with the European version of this sequence without introgression (EUR-EUR) drove significant expression above baseline (∼2.0x vector with no insert, P < 0.01, t-test). We compared this activity to constructs synthesized to carry the RAs with the associated NDA (NDA-RA), the RAs without the NDA (EUR-RA), and the NDA without the RAs (NDA-EUR). Both RA-containing sequences had significantly lower luciferase activity, and there was no significant difference in the activity of the NDA-RA and the EUR-RA sequences (Figure 4D). Thus, as predicted by the cross-population eQTL data, the RAs are associated with expression levels independently of the NDA, and the RA-containing sequences have lower activity than sequences without the RAs.
To ascertain whether the conservation of activity patterns we demonstrated at the HDHD5 locus could be attributed more specifically to any of the four RAs, we analyzed previously collected MPRA data from LCLs (46). Only one of the four cross-population RA eQTL (rs71312076) showed significant regulatory effects (RA:EUR allelic skew=2.122, P=6.6e-3, FDR=0.034) compared to the non-reintroduced allele (Figure 4E). These effects were observed on the non-introgressed European reference background, further demonstrating the ability of this RA locus to influence regulation independent of NDAs.
Together, these results provide three orthogonal lines of evidence (cross-population eQTL, luciferase reporter, and MPRA) implicating RAs in the reintroduction of regulatory effects in the HDHD5 locus. Importantly, both our luciferase assays and the MPRA data show that the functional contribution of RAs within a European genomic context is not dependent on the introgressed haplotype in which it occurs. Therefore, these data, along with the eQTL status of this region in YRI, demonstrate that Neanderthal introgression restored an allele lost in Eurasians that influences gene regulation.
DISCUSSION Here we demonstrate that hundreds of thousands of ancient alleles are present in modern Eurasians due exclusively to archaic admixture between Neanderthals and AMHs (Figure 1A). We first show that like NDAs, RAs are as associated with many traits and are enriched on haplotypes with regulatory effects in some tissues. We further show that RAs can have gene regulatory functions that are not dependent upon linked NDAs. While the interpretation of the phenotypic effects of Neanderthal introgression on AMHs has generally focused on NDAs, our results argue that RAs have the potential to independently affect gene regulation and therefore must also be considered in analyses of archaic admixture.
The evolutionary histories of RAs are likely diverse. While most RAs were probably lost in the Eurasian OOA bottleneck, Eurasian subpopulations subsequently experienced distinct demographic events that could have led to population-specific RAs. For example, East Asians are estimated to have had both substantially smaller ancestral effective population size than Europeans, as well as a greater frequency of archaic introgression events (32). These factors would increase our power to detect RAs within East Asians, because ancient hominin alleles would have had more opportunities to both be lost and to occur exclusively within introgressed regions. Conversely, in southern Europe, where more recent gene flow out of Africa has occurred, the power to detect RAs will be decreased due to ancient alleles being re-introduced outside of introgressed haplotypes (47). We expect that more sophisticated simulations and probabilistic modeling could enable the identification of additional RAs.
The regulatory and phenotypic effects of RAs are difficult to disentangle from those of NDAs, due to their high LD. Our detailed in vitro analyses of different combinations of alleles at the HDHD5 locus (Figure 4D) provides a roadmap for experimental characterization of other RAs. Analysis of known regulatory elements suggests that at least 10% (19,882) of RAs overlap gene regulatory elements (Figure S8). Therefore, we anticipate that as MPRAs, eQTL analyses, and GWAS are performed in more diverse populations and tissues, more functional RAs will be identified. In the future, it will also be informative to compare the functional effects of RAs with other alleles restored to Eurasian populations more recently by direct migration from Africa (22, 48).
Given our in vitro demonstration that RAs can restore ancestral gene regulatory functions lost in Eurasian populations, the enrichment we observe for RAs relative to NDAs in some GTEx tissues—the brain in particular—is provocative (Figure 3B). These observations are consistent with previous results regarding the gene regulatory effects of introgressed alleles. Brain tissues have enrichment for Neanderthal eQTL (24), and there is significant allele-specific down regulation of haplotypes carrying Neanderthal alleles in the brain and testes (42). Our results add to this picture, suggesting that RAs may contribute to the regulatory architectures of some tissues.
Several non-exclusive evolutionary scenarios may explain these observations. First, the depletion of NDAs relative to RAs on introgressed haplotypes with gene regulatory functions could be a result of previously demonstrated selection against NDAs in some tissues (42). This selection would deplete tissue-specific regulatory regions of NDA-rich introgressed haplotypes; indeed, the two tissues with known allele-specific down regulation of Neanderthal alleles, brain and testes, are among those enriched for RAs compared to NDAs. Second, the patterns we see could result from positive or balancing selection acting to retain beneficial RAs. Under this scenario, archaic admixture restored alleles with beneficial regulatory functions that were lost to Eurasian populations, and these RAs contributed to the maintenance of some introgressed haplotypes. The third possibility is that both RAs and NDAs on introgressed haplotypes are functional and influence selective pressures on the haplotypes. In this case, the presence of RAs could counterbalance mildly deleterious effects of associated NDAs, and thus buffer some introgressed haplotypes from purifying selection. Importantly, these explanations are not mutually exclusive, and the reality is likely some combination of all of them.
|
|