|
Post by Admin on Nov 4, 2021 20:52:17 GMT
The time and place of European admixture in Ashkenazi Jewish history James Xue,1,2 Todd Lencz,3,4,5 Ariel Darvasi,6 Itsik Pe’er,1,7 and Shai Carmi8,*
Abstract The Ashkenazi Jewish (AJ) population is important in genetics due to its high rate of Mendelian disorders. AJ appeared in Europe in the 10th century, and their ancestry is thought to comprise European (EU) and Middle-Eastern (ME) components. However, both the time and place of admixture are subject to debate. Here, we attempt to characterize the AJ admixture history using a careful application of new and existing methods on a large AJ sample. Our main approach was based on local ancestry inference, in which we first classified each AJ genomic segment as EU or ME, and then compared allele frequencies along the EU segments to those of different EU populations. The contribution of each EU source was also estimated using GLOBETROTTER and haplotype sharing. The time of admixture was inferred based on multiple statistics, including ME segment lengths, the total EU ancestry per chromosome, and the correlation of ancestries along the chromosome. The major source of EU ancestry in AJ was found to be Southern Europe (≈60–80% of EU ancestry), with the rest being likely Eastern European. The inferred admixture time was ≈30 generations ago, but multiple lines of evidence suggest that it represents an average over two or more events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event, which was likely Southern European, was estimated to ≈25–50 generations ago.
Author summary The Ashkenazi Jewish population has resided in Europe for much of its 1000-year existence. However, its ethnic and geographic origins are controversial, due to the scarcity of reliable historical records. Previous genetic studies have found links to Middle-Eastern and European ancestries, but the admixture history has not been studied in detail yet, partly due to technical difficulties in disentangling signals from multiple admixture events. Here, we present an in-depth analysis of the sources of European gene flow and the time of admixture events by using multiple new and existing methods and extensive simulations. Our results suggest a model of at least two events of European admixture. One event slightly pre-dated a late medieval founder event and was likely from a Southern European source. Another event post-dated the founder event and likely occurred in Eastern Europe. These results, as well as the methods introduced, will be highly valuable for geneticists and other researchers interested in Ashkenazi Jewish origins.
|
|
|
Post by Admin on Nov 5, 2021 1:22:27 GMT
Introduction Ashkenazi Jews (AJ), numbering approximately 10 million worldwide [1], are individuals of Jewish ancestry with a recent origin in Eastern Europe [2]. The first individuals to identify as Ashkenazi appeared in Northern France and the Rhineland (Germany) around the 10th century [3]. Three centuries later, Ashkenazi communities emerged in Poland, but the source(s) of migration are not completely clear. The Ashkenazi communities in Poland have grown rapidly, reaching, by the 20th century, millions in size and a wide geographic spread across Europe [2].
Due to the relative scarcity of relevant historical records, the ethnic origins of present-day Ashkenazi Jews are debated [2], and in such a setting, genetic data provides crucial information. A number of recent studies have shown that Ashkenazi individuals have genetic ancestry intermediate between European (EU) and Middle-Eastern (ME) sources [4–8], consistent with the long-held theory of a Levantine origin followed by partial assimilation in Europe. The estimated amount of accumulated EU gene flow varied across studies, with the most recent ones, employing genome-wide data, converging to a contribution of around 50% of the AJ ancestry [4, 7, 9].
Despite these advances, little is known about the identity of the European admixing population(s) and the time of the admixture events [2, 10]. Speculations abound, due to the wide geographic dispersion of the Jewish populations since medieval times, but with very few historical records to support any claim [2]. Further complicating the picture is an Ashkenazi-specific founder event that has taken place less than a millennium ago, as manifested by elevated frequencies of disease mutations [11, 12], reduced genetic diversity [13, 14], and an abundance of long tracts of identity-by-descent [9, 15, 16]. Results from our recent study [9] were not decisive regarding the relative times of the European admixture and the founder event, calling for a more in-depth investigation.
A number of previous population genetic studies have attempted, sometimes implicitly, to “localize” the Ashkenazi genomes to a single geographic region or source population [4–6, 17]. However, such approaches may be confounded by the mixed EU and ME Ashkenazi ancestry, which necessarily implies the existence of multiple sources. Here, we overcome this obstacle, following studies in other populations [18, 19], by performing a preliminary step of local ancestry inference (LAI), in which each locus in each Ashkenazi genome is assigned as either EU or ME. Following LAI, the source population of the European and Middle-Eastern “sub-genomes” can be independently localized.
We begin our analysis by testing the ability of available LAI software to correctly infer ancestries for simulated EU/ME genomes. Proceeding with RFMix, we apply LAI to Ashkenazi SNP array data, and use a maximum likelihood approach to localize, separately, the EU and ME sources. We correct bias introduced by the method using simulations, and show that it is robust to potential errors in LAI. We also employ other methods based on allele frequency divergence between Ashkenazi Jews and other populations, although they turn out to be less informative. To estimate the time of admixture, we first use the lengths of EU and ME tracts and the decay in ancestry correlation along the genome. We further introduce a new method for dating admixture times based the genome-wide EU or ME ancestry proportions. We again remove bias from all methods using simulations. We integrate these results with an analysis of identity-by-descent (IBD) sharing both within AJ and between AJ and other populations. Finally, we compare our estimates to those produced by the GLOBETROTTER suite [20–22]. Our results suggest that the European gene flow was predominantly Southern European (≈60–80%), with the remaining contribution either from Western or (more likely) Eastern Europe. The time of admixture, under a model of a single event, was estimated at ≈30 generations ago. However, this admixture time is likely the average of at least two distinct events. We propose that admixture with Southern Europeans pre-dated the late medieval founder event, whereas the admixture event in Eastern Europe was more recent.
|
|
|
Post by Admin on Nov 5, 2021 3:10:13 GMT
Results Data collection SNP arrays for Ashkenazi Jewish individuals were available from the schizophrenia study reported by Lencz et al., 2013 [23] (see also [24]). SNP arrays for European and Middle-Eastern populations were collected from several sources (Table 1). All genotypes were uniformly cleaned, merged, and phased (Methods), resulting in 2540 AJ, 543 Europeans, and 293 Middle-Easterners genotyped at 252,358 SNPs. Note that while there are additional studies in these populations, we restricted ourselves to (publicly available) Illumina array data to guarantee a sufficient number of remaining SNPs after merging all datasets. We divided the European genomes into four regions: Iberia, North-Western Europe (henceforth Western Europe), Eastern Europe, and Southern Europe (Italy and Greece). The Middle-Eastern genomes were divided into three regions: Levant, Southern Middle-East, and Druze. See Table 1 for further details and S1 Fig for a PCA plot [25] supporting the partition into the indicated regions.
Table 1
The populations and datasets used in our analysis. Region Sub-region Populations included Count Sources Ashkenazi 2540 Lencz et al., 2013 [23] (Illumina HumanOmni1-Quad) Europe West-EU Orcadian; French; CEU; GBR 217 Behar et al., 2010 [6] (Illumina 610k, 650k) Behar et al., 2013 [5] (Illumina 610k, 650k, 660k, 730k, 1M) HGDP [26] (Illumina 650k) 1000 Genomes [27] (Illumina Omni 2.5M) East-EU Belarusian; Lithuanian; Ukrainian; Polish; Russian 112 South-EU Italians: Tuscan, Abruzzo, Sicilian, Bergamo; Greek 162 Iberia 52 Middle-East Levant Palestinian; Lebanese; Jordanian; Syrian 146 Behar et al., 2010 [6] (Illumina 610k, 650k) Behar et al., 2013 [5] (Illumina 610k, 650k, 660k, 730k, 1M) HGDP [26] (Illumina 650k) Haber et al., 2013 [28] (Illumina 610k, 660k) South-ME Egyptian; Bedouin; Saudi 77 Druze Israeli and Lebanese 70
|
|
|
Post by Admin on Nov 5, 2021 21:05:12 GMT
Inferring the place of admixture using local ancestry inference Calibration of the local ancestry inference method In local ancestry inference (LAI), each region of the genome of each admixed individual is assigned an ancestry from one the reference panels. After evaluating the performance of a number of LAI tools on admixture between closely related populations (S1 Text section 1), we selected RFMix [29], which is based on a random forest classifier for each genomic window followed by smoothing by a conditional random field. When running RFMix, we did not iterate over the inference process using the already classified individuals (the Expectation-Maximization step), as we found that accuracy did not improve (Methods) and we wanted to avoid bias due to the widespread haplotype sharing in AJ. We also did not filter SNPs by the quality of their local ancestry assignment, as we found that such filtering substantially biases downstream inferences (S1 Text section 1). Finally, we downsampled the reference panels to balance the sizes of the European and Middle-Eastern groups, as well as balance the number of genomes from each European region (Methods). Running RFMix on the AJ genomes with our EU and ME reference panels and summing up the lengths of all tracts assigned to each ancestry, the genome-wide ancestry was ≈53% EU and ≈47% ME, consistent with our previous estimate based on a smaller sequencing panel [9]. Our simulations suggested that the accuracy of LAI for an EU-ME admixed population is only around ≈70%, much lower than the near-perfect accuracy observed for cross-continental admixture (e.g., [29–33]). The local ancestry assignment is nevertheless non-random, and therefore, with proper accounting for errors (below), can be informative on the place and time of admixture events. Geographic localization of the EU component of the AJ genomes Following the deconvolution of segments of EU and ME ancestry, we focused on the regional ancestry of the European segments. We initially followed refs. [18, 19] and attempted to apply PCAMask to the EU subset of the AJ genomes. However, PCAMask’s results were inconsistent across runs and parameter values (see S1 Text section 2 and [34]). We therefore developed a simple naïve Bayes approach. We first thinned the SNPs to assure linkage equilibrium between the remaining SNPs. We then computed the allele frequencies of the SNPs in the four EU sub-regions: Southern EU, Western EU, Eastern EU, and Iberia. Then, for each haploid chromosome, we computed the log-likelihood of the European assigned part of the chromosome to come from each of the four regions, as a product of its allele frequencies. The inferred source of each chromosome was the EU region with the maximum likelihood for that chromosome. Initial inspection of the results revealed that Iberia had consistently lower likelihood than the other regions. Since the Iberian panel was the smallest and sample sizes had to be balanced across regions, we removed the Iberian genomes from the reference panel, thereby increasing the sample size for the other regions (Methods). To determine whether the true ancestry could theoretically be recovered given a single European source, we generated simulated chromosomes using genomes not included in the RFMix reference panel. Each simulated chromosome was a mosaic of segments from Middle-Eastern and European genomes, and segment lengths were exponentially distributed, according to the expected parameters of a symmetric admixture event occurring 30 generations ago (Methods). In each simulation experiment, the identity of the European source region was varied, and the proportion of chromosomes inferred to have each EU region as their source was calculated. We found that the true EU source region had the highest proportion of classified chromosomes in all cases (Fig 1). This result indicates that localization of the European source is feasible, despite the noise and bias in local ancestry inference between closely related populations such as Middle-Easterners and Europeans. Fig 1 Simulation results for our localization pipeline. In each row, admixed genomes were simulated with sources from the Levant (50%) and one European region (50%). Columns correspond to the inferred proportion of the chromosomes classified as each potential source. The source of each chromosome was chosen as the one that maximizes the likelihood of observing the alleles designated by RFMix as European. For AJ, we found that Southern Europe was the most likely EU source for the largest proportion of the AJ chromosomes. Specifically, 43.2% of the AJ chromosomes had Southern EU as their most likely source, 35.4% had Western EU, and 18.8% had Eastern EU (the proportions do not precisely sum to 1, as we also allowed chromosomes to be classified as Middle-Eastern). These results imply that Southern Europe was the dominant source of European gene flow into AJ. We observed that in simulations of admixed genomes, the Middle-Eastern regional source could have also been recovered by running the same localization pipeline. Applying that pipeline to the AJ genomes, we identified Levant as the most likely ME source: the proportions of chromosomes classified as Levantine was 51.6%, compared to 21.7% and 22.2% classified as Druze and Southern ME, respectively. While these results indicate a sizeable contribution of ancestry from Southern Europe and the Levant, we stress that these quantities do not directly correspond to the proportion of ancestry contributed by each source. We attempt to infer those proportions in the next section. Inferring the proportion of ancestry contributed by each EU and ME region To quantitatively estimate the contribution of each subcontinental European region, we used the above-mentioned proportions of chromosomes classified to each EU region as summary statistics, and matched them to simulations in which the proportions of ancestry contributed by each region is known. Specifically, we performed 4-way admixture simulations between individuals of Levantine, Southern European, Eastern European, and Western European origin. In these simulations, we fixed the Levantine admixture proportion to 50% and varied the proportions of the different European regions. We then used a grid search to find the ancestry proportions that best fit the observed fraction of AJ chromosomes classified as each ancestry. The simulation results (Fig 2) suggest that the European component of the AJ cohort is 34% Southern EU, 8% Western EU, and 8% Eastern EU. This analysis thus suggests that roughly 70% of EU ancestry in AJ is Southern European. Using bootstrapping (S1 Text section 3), the 95% confidence interval of the Southern EU ancestry was [33,35]% and that of Eastern EU was [8,9]%. However, bootstrapping does not account for any systematic biases, which in this case are of larger magnitude (S1 Text section 3 and below).
|
|
|
Post by Admin on Nov 6, 2021 5:15:45 GMT
Fig 2 Inference of the proportion of Ashkenazi ancestry derived from each European region. We simulated admixed chromosomes with European and Middle-Eastern ancestries, where the ME ancestry was fixed to the Levant region and to 50% of the overall ancestry. We then varied the sources of the remaining European ancestry to determine which ancestry proportions most closely match the AJ data. In (A), the simulated EU components were Southern and Western EU. For each given proportion of Southern EU ancestry, we used our LAI-based pipeline to compute the proportion of chromosomes classified as Southern European. The best match to the proportion of thus classified chromosomes observed in the real AJ data (red dot) was found when the true simulated Southern EU ancestry was 31% of the total. In (B), the same simulation procedure was repeated, except that the simulated EU components were Southern and Eastern EU. The inferred proportion of Southern EU ancestry in AJ was 37%. (C) We fixed the Southern EU contribution to 34%, the average of its estimates from (A) and (B), and varied the remaining 16% between Western and Eastern EU. The simulations suggest that the closest match to the real results is at roughly equal contribution (8%) from Western and Eastern EU. To estimate the magnitude of the minor ME components, we repeated a procedure similar to that used for the European component. Specifically, we simulated admixed genomes in which the European ancestries were fixed to the proportions inferred above (34% Southern EU, 8% Western EU, and 8% Eastern EU), and varied the proportion of Levant vs Druze ancestry and then Levant vs Southern ME ancestry. The best match to the AJ data was obtained (in both cases) when the Levant ancestry was almost entirely exclusive (45% out of the total 50% ME ancestry; the magnitude of the minor components was close to zero also when we simulated 50% Southern EU ancestry). This result supports a predominantly Levantine origin for the ME ancestry in AJ, and justifies using the Levantine genomes for the ME ancestry in our simulations. In S1 Text section 3, we describe simulations that demonstrate the robustness of our pipeline to changing the proportion of simulated Levantine ancestry, including Iberia in the reference panel, and excluding from the panel the true Middle-Eastern and/or European ancestral sources. Inferring the time of admixture using local ancestry inference Mean segment length Consider a model of a “pulse” admixture between two populations, t generations ago, where the first population has contributed a fraction q of the ancestry. The mean length (in Morgans) of segments coming from the second source is 1/(qt) [35]. In the case of AJ, where the source populations are EU and ME, we estimated q above (EU ancestry fraction) to be ≈53%. Therefore, the mean ME segment length is expected to be informative on the time of admixture t. The mean ME segment length was ≈14cM; however, we noticed that in simulations, the RFMix-inferred segment lengths were significantly overestimated. To correct for that, we used simulations to find the admixture time that yielded RFMix-inferred segment lengths that best matched the real AJ data. We fixed the ancestry proportions to the ones inferred above for AJ (50% ME, 34% Southern EU, 8% Western EU, and 8% Eastern EU), and varied the admixture time. We then plotted the RFMix-inferred ME segment length vs the simulated segment lengths (Fig 3). The simulated mean segment length that corresponded to the observed AJ value was around 6.6cM, implying an admixture time of ≈29 generations ago (bootstrapping 95% confidence interval: [27,30] generations). Fig 3 Inferring the AJ admixture time using the lengths of admixture segments. The mean length of RFMix-inferred Middle-Eastern segments is plotted vs the mean simulated length, which is inversely related to the simulated admixture time. The red dot corresponds to the observed mean segment length in the real AJ data.
|
|