|
Post by Admin on Sept 17, 2018 18:24:35 GMT
Positive selection in SSA We examined highly differentiated SNPs between European and African populations, as well as among African populations to gain insights into loci that may have undergone selection in response to local adaptive forces (Supplementary Methods). To account for confounding due to Eurasian admixture, we also conducted analyses after masking Eurasian ancestry (Supplementary Methods and Supplementary Note 6). On examining locus-specific Europe–Africa differentiation, enrichment of loci known to be under positive selection was observed among the most differentiated sites (P = 1.4 × 10−31). Furthermore, there was statistically significant enrichment for gene variants among these, indicating that this differentiation is unlikely to have arisen purely from random drift (P = 0.0002). Additionally, we found no evidence for background selection as the primary driver of differentiation among these loci (Supplementary Note 7). M-indices of Orang Asli who are homozygous ancestral for SLC24A5 & SLC45A2 In addition to genes known to be under positive selection (for example, SLC24A5, SLC45A2 and OCA219,20, LARGE21 and CYP3A4/5) (Supplementary Fig. 3), we found evidence of differentiation in novel gene regions, including one implicated in malaria (for chemokine receptor 1, CR1) (Extended Data Fig. 8). CR1 carries the Knops blood group antigens and has previously been implicated in malaria susceptibility22 and severity23, with evidence suggesting positive selection in malaria-endemic regions24 (Extended Data Fig. 8). We also identified highly differentiated variants within genes involved in osmoregulation (ATP1A1 and AQP2) (Extended Data Fig. 8). Deregulation of AQP2 expression and loss-of-function mutations in ATP1A1 have been associated with essential and secondary hypertension, respectively25,26. Climatic adaptive changes in these gene regions could potentially provide a biological basis for the high burden of hypertension and differences in salt sensitivity observed in SSA27. In contrast, overall differentiation among African populations was modest (maximum masked FST = 0.19) (Supplementary Fig. 4) and only 56/1,237 sites remained in the tail distribution after masking (Supplementary Methods, Supplementary Table 6). This suggests that a large proportion of differentiation observed among African populations could be due to Eurasian admixture, rather than adaptation to selective forces (Supplementary Note 6). Genes known to be under selection were notably enriched among the most differentiated loci after masking of Eurasian ancestry (P = 2.3 × 10−16). Among the 56 loci robust to Eurasian ancestry masking (Supplementary Table 6), we identified several loci known to be under selection (Extended Data Fig. 8), including a highly differentiated variant (rs1378940) in the CSK gene region implicated in hypertension in genome-wide association studies (GWAS)28. The major allele of rs1378940 among Africans was in complete linkage disequilibrium with the risk allele of the GWAS SNP rs1378942 (ref. 29), with the frequency of this allele highly correlated with latitude (r = −0.67), providing support for local adaptation in response to temperature as a possible mechanism for hypertension (Supplementary Fig. 5)30,31. Distribution of M-indices of Senoi with and without derived alleles for SLC24A5 and SLC45A2. (A) Senoi samples with SLC24A5 and SLC45A2 ancestral alleles. (B) Senoi with derived alleles for either SLC24A5 or SLC45A2 , or a combination of both. (C) Senoi with the derived allele of SLC24A5 . (D) Senoi with the derived allele of SLC45A2 . doi:10.1371/journal.pone.0042752.g003
|
|
|
Post by Admin on Sept 19, 2018 18:33:08 GMT
Comparing populations residing in endemic and non-endemic infectious disease regions (Supplementary Methods), we identified several loci associated with infectious disease susceptibility and severity. As well as the known sickle-cell locus related to malaria, this approach identified additional signals for genes potentially under selection, including the PKLR region32, RUNX333, the haptoglobin locus, CD16334, IL1035,36, CFH, and the CD28-ICOS-CLTA4 locus (Supplementary Table 7 and Extended Data Fig. 8)37. Similar comparisons for Lassa fever identified the known LARGE gene, as well as candidates associated with viral entry and immune response, including in the Histocompatibility Leukocyte Antigen region, DC-SIGN/DC-SIGNR38 (also known as CD209/CLEC4M), RNASEL, CXCR6, IFIH139 and OAS2/3 regions (Supplementary Table 7). For trypanosomiasis, we identified APOL140, as well as several loci implicated in immune response and binding to trypanosoma, including FAS, FASLG41,42, IL23R43, SIGLEC6 and SIGLEC12 (Supplementary Table 7)44. For trachoma, we identified signals in ABCA1 and CXCR6, which may be important for the growth of the parasite and host immune response, respectively (Supplementary Table 7)45,46. Figure 3: Improvement in imputation accuracy with the AGVP WGS panel. To assess the utility of a larger and more diverse African reference panel for imputation, we generated a panel integrating the 1000 Genomes Project phase I and AGVP WGS panels (Supplementary Methods and Supplementary Note 9). Using this integrated panel, we observed marked improvements in imputation accuracy across the whole range of the allele frequency spectrum in specific populations poorly represented by the 1000 Genomes Project panel (Fig. 3 and Supplementary Note 11). These findings suggest that even common haplotypes in some SSA populations may not be sufficiently captured by existing panels, limiting our power to examine associations of common variants with disease. Importantly, given the specificity of the improvement in imputation accuracy, we infer that targeted sequencing of divergent populations representing a broad spectrum of haplotypes across Africa, including HG and North/East African haplotypes, rather than widespread population sequencing is likely to provide a more efficient strategy to improve imputation accuracy and a practicable GWAS framework in Africa. We compared the utility of existing chip designs (2.5M Illumina) and ultralow-coverage WGS designs (0.5×, 1×, 2× coverage) to determine the optimal design for African GWAS. Sensitivity for common variation was >90% at all sequencing depths (Supplementary Note 12). Examining the effective sample size for a fixed budget50, we found the effective sample size was greater for all ultralow-coverage WGS and chip array designs compared with 4× WGS. When computational costs were accounted for (Supplementary Note 12), the HumanOmni2.5M array provided the greatest effective sample size supporting the development and large-scale use of efficient genotype arrays in Africa, where these have been underutilized. We therefore sought to evaluate a potential chip design to tag common variation across a wider range of African populations (Supplementary Note 13). Importantly, we show that an array with one million genetic variants could capture >80% of common variation (minor allele frequency >5%) across the genome (Extended Data Fig. 10). These analyses suggest that designing a pan-African genotype array to effectively capture common genetic variation across Africa is feasible, and could greatly facilitate large-scale genomic studies in Africa.
|
|
|
Post by Admin on Sept 21, 2018 18:25:02 GMT
Discussion The marked haplotype diversity within Africa has important implications for the design of large-scale medical genomics studies across the region, as well as studies of population history and evolution. In this context, the AGVP is a resource that will facilitate a broad range of genomic studies in Africa and globally. Although Africa is the most genetically diverse region in the world, we provide evidence for relatively modest differentiation among populations representing the major sub-populations in SSA, consistent with recent population movement and expansion across the region beginning around 5,000 years ago—the Bantu expansion8. Although the history of the Bantu expansion is probably complex, assessments of population admixture can provide new insights. We note historically complex and regionally distinct admixture with multiple HG and Eurasian populations across SSA, including ancient HG and Eurasian ancestry in West and East Africa and more recent complex HG admixture in South Africa. As well as explaining genetic differentiation among modern populations in SSA, these admixture patterns provide genetic evidence for early back-to-Africa migrations, the possible existence of extant HG populations in western Africa—compatible with archaeological evidence15, and patterns of gene flow consistent with the Bantu expansion, including genetic assimilation of populations resident across the region. Haplogroup T-M184 This admixture also has important implications for the assessment of differentiation and positive selection in Africa. Accounting for these elements, we have identified loci under positive selection that are linked with hypertension, malaria, and other pathogens. This provides a proof-of-concept for the ability of geographically widespread genetic data within Africa to identify loci under selection related to diverse environments. Our evidence for the broad transferability of genetic association signals and their statistical refinement has important implications for medical genetic research in Africa. Importantly, we highlight that such studies are feasible and can be enabled through the development of more efficient genotype arrays and diverse WGS reference panels for accurate imputation of common variation. In this context, we describe a framework for a new pan-African genotype array that could directly facilitate large-scale genomic studies in Africa. A critical next step is the large-scale deep sequencing of multiple and diverse populations across Africa, which should be integrated with ancient DNA data. This would enable us to identify and understand signals of ancient admixture, patterns of historical population movements, and to provide a comprehensive resource for medical genomic studies in Africa. Nature volume 517, pages 327–332 (15 January 2015) doi:10.1038/nature13997
|
|
|
Post by Admin on Dec 13, 2018 18:01:55 GMT
The history of southern Africa involved interactions between indigenous hunter–gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter–gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger–Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to ∼900–1,800 y ago and show that it had the largest demographic impact in Khoisan populations that speak Khoe–Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to ∼2,700–3,300 y ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa. West Eurasian Ancestry in the Ju|’hoan_North. We previously observed that the Ju|’hoan_North, although the least admixed of all Khoisan populations, show a clear signal of admixture when using a test based on the decay of admixture linkage disequilibrium (LD) (3). The theoretical and practical aspects of historical inference from admixture LD have since been examined in greater detail (6); we thus reevaluated this signal in the Ju|’hoan_North using the software ALDER v1.0 (6). In particular, we were interested in identifying the source of the gene flow by comparing weighted LD curves computed using different reference populations. This is possible because theory predicts that the amplitude of these curves (i.e., the average level of weighted LD between sites separated by 0.5 centiMorgans) becomes larger as one uses reference populations that are closer to the true mixing populations. Loh et al. (6) additionally showed that this theory holds when using the admixed population itself as one of the reference populations. We thus computed weighted LD curves in the Ju|’hoan_North, using the Ju|’hoan_North themselves as one reference population and a range of 74 worldwide populations as the other, and examined the amplitudes of these curves (Fig. 1A). The largest amplitudes are obtained with European populations as references (Fig. 1A); taken literally, this would seem to implicate Europe as the source of admixture (although Middle Eastern populations are also among the best proxies). The estimated date for this gene flow is 43 ± 2 generations [1,290 ± 60 y, assuming 30 y per generation (7)] before the present, consistent with our previously estimated date (3). This date is well before the historical arrival of European colonists to the region. Fig. 1. Identifying sources of admixture using LD. In each panel, we computed weighted LD curves with ALDER v1.0 using a test population as one reference and a panel of other populations as the second reference. We next tested the robustness of this result. We confirmed that this observation is consistent across panels of SNPs with varied ascertainment (SI Appendix, Fig. S2). We then considered hunter–gatherer populations from other regions of Africa. In particular, we performed the same analysis on the Biaka (Fig. 1B) and Mbuti (SI Appendix, Fig. S3) from central Africa. As expected, the inferred source of admixture in these populations is a sub-Saharan African population (most closely related to the Yoruba, a Niger–Congo-speaking agriculturalist group from Nigeria). A signal of west Eurasian ancestry in the Ju|’hoan_North should be identifiable by allele frequencies as well as by LD. We thus tested the population tree [Chimp,[Ju|’hoan_North, [Han, French]]] using an statistic (8, 9). This tree fails with a Z-score of 4.0. On smaller subsets of SNPs, the evidence is weaker, explaining why we had not noticed it previously (on the set of SNPs ascertained in a Ju|’hoan individual; in a French individual; in a Yoruba individual, Graphic). We thus conclude that there is a signal in both allele frequencies and linkage disequilibrium that the Ju|’hoan_North admixed with a population more closely related to western Eurasian (i.e., European or Middle Eastern) rather than eastern Eurasian populations, and that this signal is absent from hunter–gatherer populations in central Africa.
|
|
|
Post by Admin on Dec 14, 2018 17:56:35 GMT
Signal of West Eurasian Relatedness Is Shared Throughout Southern Africa. We next examined whether this signal of relatedness to west Eurasia is present in other Khoisan populations. For each Khoisan population, we used ALDER to compute weighted LD decay curves using the test population as one reference and either the French or the Yoruba as the other reference. We included the central African Mbuti and Biaka populations as negative controls. In all Khoisan populations, the amplitude of the LD decay curve is larger when using the French as a reference than when using the Yoruba as a reference (Fig. 2A). In contrast, for the Mbuti and Biaka, the larger amplitude is seen when using the Yoruba as a reference (Fig. 2A). Fig. 2. Relationship with west Eurasia is shared by all Khoisan populations. We generated weighted LD decay curves in each Khoisan (or central African hunter–gatherer) population. A striking observation that emerges from this analysis is that in many of the southern African populations the inferred mixture times depend substantially on the second population used as a reference (Fig. 2B). Under a model of admixture from a single source population, the decay rate of the LD curve does not depend on the reference population used (6); this suggests that there are at least two separate non-Khoisan sources of ancestry in some of these Khoisan populations. In contrast, for the central African Mbuti and Biaka the inferred times do not depend on the reference used. Fig. 3. LD evidence for multiple waves of mixture in the G‖ana. We computed 990 (45 choose 2) weighted LD curves in the G‖ana and fit two models: one with a single admixture event, and one with two admixture events. Estimating Parameters of Multiple Admixture Events. Motivated by the above observations, we designed a method to estimate dates of multiple admixture events in the history of a population (related ideas have been explored by Myers et al.*). We extended the population genetic theory of Loh et al. (6) to the case where a population has experienced multiple episodes of population admixture from different sources (SI Appendix). In this situation, the extent of admixture LD in the population is no longer a single exponential curve as a function of genetic distance, but instead is a mixture of exponential curves. Using a range of reference populations, we can thus formally test for the presence of multiple waves of mixture and estimate the dates of these mixture events (SI Appendix). We validated this approach using coalescent simulations of three pairs of mixture dates chosen to span the scenarios that our data suggest are relevant to southern and eastern Africa (SI Appendix). The simulations indicate that our method has reasonable but not perfect power; depending on the pair of dates we simulated, we successfully detected both events in between 50% and 90% of simulated cases. To illustrate the intuition behind this method, in Fig. 3 we plot one of the weighted LD curves calculated in the G‖ana. Under a model with a single admixture event, the mean admixture date in the G‖ana is estimated as generations, identical to the date obtained by Pickrell et al. (3). However, it is visually apparent that this model is a poor fit to the data (Fig. 3). Indeed, we find that adding a second mixture event significantly improves the fit (minimum Z-score on the two admixture times of 2.8). The two inferred mean admixture times in the G‖ana are 4 ± 1 and 39 ± 6 generations ago. This method additionally estimates amplitudes of the LD decay curves for each pair of populations on each mixture time, which are a function of the relationship between the reference populations and the true source populations. These amplitudes can be used to infer the references closest to the true mixing populations. However, if a source population is itself admixed, under some conditions this method will identify a population related to one of the ancestral components of the source population instead of the source population itself (SI Appendix). By examining these amplitudes, we conclude that the west Eurasian ancestry in the G‖ana entered the population through the older admixture event (Fig. 3). Because of the caveat noted above, however, we cannot distinguish between two historical scenarios with this method: direct gene flow from a west Eurasian population and gene flow from a west Eurasian-admixed population.
|
|