Mal'ta Boy (ANE): Origins of Native Americans Jan 14, 2019 18:46:28 GMT
Post by Admin on Jan 14, 2019 18:46:28 GMT
(A) Admixture coefficients plotted for dataset ‘GenoChip + Illumina arrays’. Abbreviated names of admixture components are shown on the left as follows: SAM, South American; NAM, North American; ESK, Eskimo (Beringian); SEA, South-East Asian; SIB, Siberian; NEU, North European; ME, Middle Eastern; CAU, Caucasian; SAS, South Asian; OCE, Oceanian; AFR, African. The Ket-Uralic (‘Ket’) admixture component appears at K ≥ 11, and admixture coefficients are plotted for K = 4, 10, 11, and 19. Although K = 20 demonstrates the lowest average cross-validation error, the Ket-Uralic component splits in two at this K value, therefore K = 19 was chosen for the final analysis. Only populations containing at least one individual with >5% of the Ket-Uralic component at K = 19 are plotted, and individuals are sorted according to values of the Ket-Uralic component. Admixture coefficients for the Saqqaq ancient genome are shown separately on the right, and for two reference Kets and two Ket individuals from this study - on the left. (B) Average cross-validation (CV) error graph with standard deviations plotted. Ten-fold cross-validation was performed. The graph has a minimum at K = 20. (C) Color-coded values of the Ket-Uralic admixture component at K = 19 plotted on the world map using QGIS v.2.8. Maximum values in each population are taken, and only values >5% are plotted. Top five values of the component are shown in the bottom left corner, and the value for Saqqaq is shown on the map.
Ket-Uralic’ admixture component
Using the GenoChip SNP array20, we genotyped 130 K ancestry-informative markers in the Ket, Selkup, Nganasan, and Enets populations (Suppl. file S1). Following the exclusion of first-, second-, and third-degree relatives among the individuals genotyped in this study (Suppl. file S1, Suppl. Fig. 4.1), we merged the GenoChip array data with the published SNP array datasets to produce a worldwide dataset of 90 populations and 1,624 individuals, focused on Siberia and America (Suppl. Table 2). The intersection dataset, containing 32,189 SNPs (Suppl. Table 1), was analyzed with ADMIXTURE37 (Fig. 1). At K ≥ 11, ADMIXTURE identified a characteristic component for the Ket population (Suppl. Information Section 5). This component reached its global maximum of nearly 100% in Kets, closely followed by Selkups from this study (up to 81.5% at K = 19), the reference Selkups (up to 48.5%) and the Enets (up to 22.6%). The difference between the Selkups from this study and the reference Selkups21 can be attributed to a much closer geographic proximity of the former population to the settlements of Kets, with whom they have a long history of cohabitation and mixture2,10.
The ‘Ket’ component occurred at high levels (up to ~20%) in four Turkic-speaking populations of the Altai region: Shors, Khakases, Altaians, and Teleuts. Notably, the Altai region was populated by Yeniseian-speaking people before they were forced to retreat north (Suppl. Information, Section 2). Lower levels of the ‘Ket’ component, from 5% to 15%, were observed in the following geographic regions (in decreasing order): the Volga-Ural region, Central and South Asia, East Siberia and Mongolia, and North Caucasus. The ‘Ket’ component also occurred at a low level in Russians (up to 7.1%), Finns (up to 5.4%), and, remarkably, in the Saqqaq ancient genome from Greenland (7.2%, see below).
In order to verify and explain the geographic distribution of the ‘Ket’ admixture component, we have performed ADMIXTURE analysis on two additional datasets, different in population (Suppl. Table 2) and marker selection (Suppl. Table 1) (see Suppl. Information, Section 5). In summary, we suggest the existence of an admixture component with a peculiar geographic distribution, observed in some previous studies but not discussed there17,18. In addition to the Kets, this component is characteristic also for Samoyedic-speaking and Ugric-speaking people of the Uralic language family: Selkups, Enets, Nenets, Khanty, Mansi, with a notable exception of Samoyedic-speaking Nganasans. The proportion of the ‘Ket-Uralic’ admixture component correlated strongly with the worldwide frequency of mitochondrial haplogroup U4 (Pearson’s correlation coefficient up to 0.8 and a corresponding p-value of 7 × 10−8) and with the frequency of Y-chromosomal haplogroup Q in Eurasian populations (correlation coefficient up to 0.9 and p-value 2 × 10−7) (Suppl. Information, Section 10).
Kets in the context of Siberian populations
In order to study the relationship of Kets and other Siberian populations with the relevant ancient genomes, we have constructed three additional datasets: the dataset based on the Ket genome sequences and the HumanOrigins array SNP data22, and two datasets based on genome sequences only (Suppl. Tables 1 and 2). The Ket and Selkup populations were closely related according to multiple analyses (see the ADMIXTURE plot in Fig. 1, PCA plots in Suppl. Figs. 6.3, 6.6, TreeMix tree in Fig. 2, and outgroup f3 statistics38 in Suppl. Fig. 7.2). Nganasans appeared as the closest relatives of Kets according to statistics f3 (Yoruba; Ket, X): the statistic for Nganasans was significantly different from that of the second-best hit (Suppl. Fig. 7.2). In general, outgroup f3 statistics (Yoruba; Test, X) were tightly correlated between the Kets, the Selkups, and the Nganasans, with Pearson’s correlation coefficients ranging from 0.96 to 0.999 (Suppl. Information, Section 7), suggesting that these populations form a closely related group. In line with these results, Nganasans, Kets, Selkups, and Yukaghirs formed a clade in a maximum likelihood tree constructed with TreeMix on a HumanOrigins-based dataset of 194,750 SNPs (Fig. 2).
In our ADMIXTURE analyses (Fig. 1A, Suppl. Fig. 5.4), the Saqqaq Paleo-Eskimo individual featured the following components: Beringian, Siberian, and South-East Asian. Thus, Saqqaq Paleo-Eskimo has mostly Beringian ancestry (similar to modern Eskimo, Inuits, Aleutians, Koryaks, etc.): see outgroup f3 statistics and associated Zdiff scores in Suppl. Figs. 7.17–7.19, migration edges modelled with TreeMix in Fig. 3, and the ADMIXTURE results in the original study16. Beringian ancestry in Saqqaq is combined with considerable Siberian ancestry: 32% or 28% as a sum of Siberian ADMIXTURE components in this study (Fig. 1A, Suppl. Fig. 5.4); ~25% according to ADMIXTURE analysis in the original study (Rasmussen et al. 2010)16; from 31% to 57% according to f4 statistic ratios calculated with various outgroups (Suppl. Information, Section 8, Suppl. Table 6). This ‘core Siberian’ component in Saqqaq is apparently most closely related to modern Nganasans16 (Suppl. Fig. 7.17) and to the Nganasan-related clade in general (see a TreeMix tree in Fig. 2). The Kets are the only representatives of this clade in the genome-based datasets in this study. According to the pairwise correlation between outgroup f3 statistics (the method used in Allentoft et al.6), Kets are closer to Saqqaq as compared to Nivkhs, Altaians, Buryats, and Yakuts (Suppl. file S2). According to Euclidean distances in the ten-dimensional space of principal components on the HumanOrigins dataset, Kets were a closer population to Saqqaq than Nganasans, Selkups, Yukaghirs, and the other populations (Fig. 4). However, the outgroup f3 statistics (Yoruba; Saqqaq, Ket) in many cases were not significantly different from f3 (Yoruba; Saqqaq, other Siberian population): e.g., see |Zdiff| scores < 3 for f3 (Yoruba; Saqqaq, Nivkh) (Suppl. Figs. 7.18, 7.19). The same result was produced with f4 (Saqqaq, Yoruba; Nivkh, Ket): an absolute Z-score was lower than 2 (Suppl. Fig. 8.18A,B).