Genetic Roots of the Ashkenazi Jews Jan 11, 2019 18:36:49 GMT
Post by Admin on Jan 11, 2019 18:36:49 GMT
Fig 1. Simulation results for our localization pipeline.
In each row, admixed genomes were simulated with sources from the Levant (50%) and one European region (50%). Columns correspond to the inferred proportion of the chromosomes classified as each potential source. The source of each chromosome was chosen as the one that maximizes the likelihood of observing the alleles designated by RFMix as European.
For AJ, we found that Southern Europe was the most likely EU source for the largest proportion of the AJ chromosomes. Specifically, 43.2% of the AJ chromosomes had Southern EU as their most likely source, 35.4% had Western EU, and 18.8% had Eastern EU (the proportions do not precisely sum to 1, as we also allowed chromosomes to be classified as Middle-Eastern). These results imply that Southern Europe was the dominant source of European gene flow into AJ.
We observed that in simulations of admixed genomes, the Middle-Eastern regional source could have also been recovered by running the same localization pipeline. Applying that pipeline to the AJ genomes, we identified Levant as the most likely ME source: the proportions of chromosomes classified as Levantine was 51.6%, compared to 21.7% and 22.2% classified as Druze and Southern ME, respectively.
While these results indicate a sizeable contribution of ancestry from Southern Europe and the Levant, we stress that these quantities do not directly correspond to the proportion of ancestry contributed by each source. We attempt to infer those proportions in the next section.
Fig 2. Inference of the proportion of Ashkenazi ancestry derived from each European region.
To estimate the magnitude of the minor ME components, we repeated a procedure similar to that used for the European component. Specifically, we simulated admixed genomes in which the European ancestries were fixed to the proportions inferred above (34% Southern EU, 8% Western EU, and 8% Eastern EU), and varied the proportion of Levant vs Druze ancestry and then Levant vs Southern ME ancestry. The best match to the AJ data was obtained (in both cases) when the Levant ancestry was almost entirely exclusive (45% out of the total 50% ME ancestry; the magnitude of the minor components was close to zero also when we simulated 50% Southern EU ancestry). This result supports a predominantly Levantine origin for the ME ancestry in AJ, and justifies using the Levantine genomes for the ME ancestry in our simulations.
Consider a model of a “pulse” admixture between two populations, t generations ago, where the first population has contributed a fraction q of the ancestry. The mean length (in Morgans) of segments coming from the second source is 1/(qt) . In the case of AJ, where the source populations are EU and ME, we estimated q above (EU ancestry fraction) to be ≈53%. Therefore, the mean ME segment length is expected to be informative on the time of admixture t. The mean ME segment length was ≈14cM; however, we noticed that in simulations, the RFMix-inferred segment lengths were significantly overestimated. To correct for that, we used simulations to find the admixture time that yielded RFMix-inferred segment lengths that best matched the real AJ data. We fixed the ancestry proportions to the ones inferred above for AJ (50% ME, 34% Southern EU, 8% Western EU, and 8% Eastern EU), and varied the admixture time. We then plotted the RFMix-inferred ME segment length vs the simulated segment lengths (Fig 3). The simulated mean segment length that corresponded to the observed AJ value was around 6.6cM, implying an admixture time of ≈29 generations ago (bootstrapping 95% confidence interval: [27,30] generations).
Fig 3. Inferring the AJ admixture time using the lengths of admixture segments.
The mean length of RFMix-inferred Middle-Eastern segments is plotted vs the mean simulated length, which is inversely related to the simulated admixture time. The red dot corresponds to the observed mean segment length in the real AJ data.
Beyond mean segment lengths, the proportion of ancestry per chromosome that descends from each ancestral population is also informative on the time of admixture [36, 37], since the longer the time after admixture, the smaller its variance . While ancestry proportions contain less information than segment lengths, they are potentially more robust to misidentification of the segments boundaries. Building on models from refs. [35, 38, 39], we derived a new analytical expression for the distribution of ancestry proportions (for either phased or unphased data) given the initial admixture proportions and admixture time (Methods). This led to a maximum likelihood estimator of the admixture time and the initial proportions. For admixture between highly diverged populations, the method is expected to work well for intermediate admixture times (e.g., 10<t<100 generations ), as we demonstrated using simulations in which the true segment boundaries were known (S2 Fig).
Fig 4. The Probability Density Function (PDF) of ancestry proportions in AJ and in simulations.
To apply our method to AJ, we used the LAI results and summed up the lengths of European and Middle-Eastern segments. However, our simulations showed that for Southern EU/ME admixture, the correlation between true and inferred ancestry proportions is only r2 ≈ 0.11 (S3 Fig), and therefore, we could not directly apply our method. To correct for the distortion of the distribution due to local ancestry inference, we again used EU/ME admixture simulations, and matched the variance of the AJ distribution to that of genomes simulated under admixture times between 10 to 60 generations. We found that the best fit to the AJ data, given a 4-way admixture model (Middle-Eastern, Southern EU, Eastern EU, and Western EU with proportions 50:34:8:8 (%), respectively) was obtained with admixture time of 32 generations (Fig 4) (95% bootstrapping confidence interval [31,37] generations), close to the time inferred above using the mean segment lengths.