Post by Admin on Jul 11, 2020 19:23:41 GMT
Origins of Western Eurasian genetic signatures in South Asians
The presence of Western Eurasian ancestry in many present-day South Asian populations south of the central steppe has been used to argue for gene flow from Early Bronze Age (~3000 to 2500 BCE) western steppe pastoralists into the region (42, 43). However, direct influence of Yamnaya or related cultures of that period is not visible in the archaeological record, except perhaps for a single burial mound in Sarazm in present-day Tajikistan of contested age (44, 45). Additionally, linguistic reconstruction of protoculture coupled with the archaeological chronology evidences a Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) arrival of the Indo-Iranian languages into South Asia (16, 45, 46). Thus, debate persists as to how and when Western Eurasian genetic signatures and IE languages reached South Asia.
To address these issues, we investigated whether the source of the Western Eurasian signal in South Asians could derive from sources other than Yamnaya and Afanasievo (Fig. 1). Both Early Bronze Age (~3000 to 2500 BCE) steppe pastoralists Yamnaya and Afanasievo and Late Bronze Age (~2300 to 1200 BCE) Sintashta and Andronovo carry substantial amounts of EHG and CHG ancestry (1, 2, 7), but the latter group can be distinguished by a genetic component acquired through admixture with European Neolithic farmers during the formation of the Corded Ware complex (1, 2), reflecting a secondary push from Europe to the east through the forest-steppe zone.
We characterized a set of four south Turkmenistan samples from Namazga period III (~3300 BCE). In our PCA analysis, the Namazga_CA individuals were placed in an intermediate position between Iran Neolithic and western steppe clusters (Fig. 2). Consistent with this, we find that the Namazga_CA individuals carry a significantly larger fraction of EHG-related ancestry than Neolithic skeletal material from Iran [D(EHG, Mbuti; Namazga_CA, Iran_N) Z = 4.49], and we are not able to reject a two-population qpAdm model in which Namazga_CA ancestry was derived from a mixture of Neolithic Iranians and EHG (~21%) (P = 0.49).
Although CHG contributed both to Copper Age steppe individuals (e.g., Khvalynsk, ~5150 to 3950 BCE) and substantially to Early Bronze Age (~3000 to 2500 BCE) steppe Yamnaya and Afanasievo (1, 2, 7, 47), we do not find evidence of CHG-specific ancestry in Namazga. Despite the adjacent placement of CHG and Namazga_CA on the PCA plot, D(CHG, Mbuti; Namazga_CA, Iran_N) does not deviate significantly from 0 (Z = 1.65), in agreement with ADMIXTURE results (Fig. 3 and fig. S14). Moreover, a three-population qpAdm model using Iran Neolithic, EHG, and CHG as sources yields a negative admixture coefficient for CHG. This suggests that while we cannot totally reject a minor presence of CHG ancestry, steppe-related admixture most likely arrived in the Namazga population before the Copper Age or from unadmixed sources related to EHG. This is consistent with the upper temporal boundary provided by the date of the Namazga_CA samples (~3300 BCE). In contrast, the Iron Age (~900 to 200 BCE) individual from the same region as Namazga (sample DA382, labeled Turkmenistan_IA) is closer to the steppe cluster in the PCA plot and does have CHG-specific ancestry. However, it also has European farmer–related ancestry typical of Late Bronze Age (~2300 to 1200 BCE) steppe populations (1–3, 47) [D(Neolithic European, Mbuti; Namazga_CA, Turkmenistan_IA) Z = -4.04], suggesting that it received admixture from Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) steppe populations.
In a PCA focused on South Asia (Fig. 2B), the first dimension corresponds approximately to west-east and the second dimension to north-south. Near the lower right are the Andamanese Onge, previously used to represent the Ancient South Asian component (12, 42). Contemporary South Asian populations are placed along both east-west and north-south gradients, reflecting the presence of three major ancestry components in South Asia deriving from West Eurasians, South Asians, and East Asians. Because the Namazga_CA individuals appear at one end of the West Eurasian/South Asian axis, and given their geographical proximity to South Asia, we tested this group as a potential source in a set of qpAdm models for the South Asian populations (Fig. 6).
Fig. 6 A summary of the four qpAdm models fitted for South Asian populations.
For each modern South Asian population, we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the first model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_IA (East Asian proxy), and 4. Turkmenistan_IA + Xiongnu_IA. Xiongnu_IA were used here to represent East Asian ancestry. We observe that although South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA, an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers, an East Asian rather than a Steppe_MLBA source is required.
We are not able to reject a two-population qpAdm model using Namazga_CA and Onge for nine modern southern and predominantly Dravidian-speaking populations (Fig. 6, fig. S36, and tables S16 and S17). In contrast, for seven other populations belonging to the northernmost Indic- and Iranian-speaking groups, this two-population model is rejected, but not a three-population model including an additional Late Bronze Age (~2300 to 1200 BCE) steppe source. Last, for seven southeastern Asian populations, six of which were Tibeto-Burman or Austro-Asiatic speakers, the three-population model with Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was rejected, but not a model in which Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was replaced with an East Asian ancestry source, as represented by the Late Iron Age (~200 BCE to 100 CE) Xiongnu (Xiongnu_IA) nomads from Mongolia (3). Interestingly, for two northern groups, the only tested model we could not reject included the Iron Age (~900 to 200 BCE) individual (Turkmenistan_IA) from the Zarafshan Mountains and the Xiongnu_IA as sources. These findings are consistent with the positions of the populations in PCA space (Fig. 2B) and are further supported by ADMIXTURE analysis (Fig. 3), with two minor exceptions: In both the Iyer and the Pakistani Gujar, we observe a minor presence of the Late Bronze Age (~2300 to 1200 BCE) steppe ancestry component (fig. S14) not detected by the qpAdm approach. Additionally, we document admixture along the West Eurasian and East Asian clines of all South Asian populations using D statistics (fig. S37).
Thus, we find that ancestries deriving from four major separate sources fully reconcile the population history of present-day South Asians (Figs. 3 and 6), one anciently South Asian, one from Namazga or a related population, a third from Late Bronze Age (~2300 to 1200 BCE) steppe pastoralists, and one from East Asia. They account for western ancestry in some Dravidian populations that lack CHG-specific ancestry while also fitting the observation that whenever there is CHG-specific ancestry and considerable EHG ancestry, there is also European Neolithic ancestry (Fig. 3). This implicates Late Bronze Age (~2300 to 1200 BCE) steppe rather than Early Bronze Age (~3000 to 2500 BCE) Yamnaya and Afanasievo admixture into South Asia. The proposal that the IE steppe ancestry arrived in the Late Bronze Age (~2300 to 1200 BCE) is also more consistent with archaeological and linguistic chronology (44, 45, 48, 49). Thus, it seems that the Yamnaya- and Afanasievo-related migrations did not have a direct genetic impact in South Asia.
The presence of Western Eurasian ancestry in many present-day South Asian populations south of the central steppe has been used to argue for gene flow from Early Bronze Age (~3000 to 2500 BCE) western steppe pastoralists into the region (42, 43). However, direct influence of Yamnaya or related cultures of that period is not visible in the archaeological record, except perhaps for a single burial mound in Sarazm in present-day Tajikistan of contested age (44, 45). Additionally, linguistic reconstruction of protoculture coupled with the archaeological chronology evidences a Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) arrival of the Indo-Iranian languages into South Asia (16, 45, 46). Thus, debate persists as to how and when Western Eurasian genetic signatures and IE languages reached South Asia.
To address these issues, we investigated whether the source of the Western Eurasian signal in South Asians could derive from sources other than Yamnaya and Afanasievo (Fig. 1). Both Early Bronze Age (~3000 to 2500 BCE) steppe pastoralists Yamnaya and Afanasievo and Late Bronze Age (~2300 to 1200 BCE) Sintashta and Andronovo carry substantial amounts of EHG and CHG ancestry (1, 2, 7), but the latter group can be distinguished by a genetic component acquired through admixture with European Neolithic farmers during the formation of the Corded Ware complex (1, 2), reflecting a secondary push from Europe to the east through the forest-steppe zone.
We characterized a set of four south Turkmenistan samples from Namazga period III (~3300 BCE). In our PCA analysis, the Namazga_CA individuals were placed in an intermediate position between Iran Neolithic and western steppe clusters (Fig. 2). Consistent with this, we find that the Namazga_CA individuals carry a significantly larger fraction of EHG-related ancestry than Neolithic skeletal material from Iran [D(EHG, Mbuti; Namazga_CA, Iran_N) Z = 4.49], and we are not able to reject a two-population qpAdm model in which Namazga_CA ancestry was derived from a mixture of Neolithic Iranians and EHG (~21%) (P = 0.49).
Although CHG contributed both to Copper Age steppe individuals (e.g., Khvalynsk, ~5150 to 3950 BCE) and substantially to Early Bronze Age (~3000 to 2500 BCE) steppe Yamnaya and Afanasievo (1, 2, 7, 47), we do not find evidence of CHG-specific ancestry in Namazga. Despite the adjacent placement of CHG and Namazga_CA on the PCA plot, D(CHG, Mbuti; Namazga_CA, Iran_N) does not deviate significantly from 0 (Z = 1.65), in agreement with ADMIXTURE results (Fig. 3 and fig. S14). Moreover, a three-population qpAdm model using Iran Neolithic, EHG, and CHG as sources yields a negative admixture coefficient for CHG. This suggests that while we cannot totally reject a minor presence of CHG ancestry, steppe-related admixture most likely arrived in the Namazga population before the Copper Age or from unadmixed sources related to EHG. This is consistent with the upper temporal boundary provided by the date of the Namazga_CA samples (~3300 BCE). In contrast, the Iron Age (~900 to 200 BCE) individual from the same region as Namazga (sample DA382, labeled Turkmenistan_IA) is closer to the steppe cluster in the PCA plot and does have CHG-specific ancestry. However, it also has European farmer–related ancestry typical of Late Bronze Age (~2300 to 1200 BCE) steppe populations (1–3, 47) [D(Neolithic European, Mbuti; Namazga_CA, Turkmenistan_IA) Z = -4.04], suggesting that it received admixture from Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) steppe populations.
In a PCA focused on South Asia (Fig. 2B), the first dimension corresponds approximately to west-east and the second dimension to north-south. Near the lower right are the Andamanese Onge, previously used to represent the Ancient South Asian component (12, 42). Contemporary South Asian populations are placed along both east-west and north-south gradients, reflecting the presence of three major ancestry components in South Asia deriving from West Eurasians, South Asians, and East Asians. Because the Namazga_CA individuals appear at one end of the West Eurasian/South Asian axis, and given their geographical proximity to South Asia, we tested this group as a potential source in a set of qpAdm models for the South Asian populations (Fig. 6).
Fig. 6 A summary of the four qpAdm models fitted for South Asian populations.
For each modern South Asian population, we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the first model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_IA (East Asian proxy), and 4. Turkmenistan_IA + Xiongnu_IA. Xiongnu_IA were used here to represent East Asian ancestry. We observe that although South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA, an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers, an East Asian rather than a Steppe_MLBA source is required.
We are not able to reject a two-population qpAdm model using Namazga_CA and Onge for nine modern southern and predominantly Dravidian-speaking populations (Fig. 6, fig. S36, and tables S16 and S17). In contrast, for seven other populations belonging to the northernmost Indic- and Iranian-speaking groups, this two-population model is rejected, but not a three-population model including an additional Late Bronze Age (~2300 to 1200 BCE) steppe source. Last, for seven southeastern Asian populations, six of which were Tibeto-Burman or Austro-Asiatic speakers, the three-population model with Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was rejected, but not a model in which Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was replaced with an East Asian ancestry source, as represented by the Late Iron Age (~200 BCE to 100 CE) Xiongnu (Xiongnu_IA) nomads from Mongolia (3). Interestingly, for two northern groups, the only tested model we could not reject included the Iron Age (~900 to 200 BCE) individual (Turkmenistan_IA) from the Zarafshan Mountains and the Xiongnu_IA as sources. These findings are consistent with the positions of the populations in PCA space (Fig. 2B) and are further supported by ADMIXTURE analysis (Fig. 3), with two minor exceptions: In both the Iyer and the Pakistani Gujar, we observe a minor presence of the Late Bronze Age (~2300 to 1200 BCE) steppe ancestry component (fig. S14) not detected by the qpAdm approach. Additionally, we document admixture along the West Eurasian and East Asian clines of all South Asian populations using D statistics (fig. S37).
Thus, we find that ancestries deriving from four major separate sources fully reconcile the population history of present-day South Asians (Figs. 3 and 6), one anciently South Asian, one from Namazga or a related population, a third from Late Bronze Age (~2300 to 1200 BCE) steppe pastoralists, and one from East Asia. They account for western ancestry in some Dravidian populations that lack CHG-specific ancestry while also fitting the observation that whenever there is CHG-specific ancestry and considerable EHG ancestry, there is also European Neolithic ancestry (Fig. 3). This implicates Late Bronze Age (~2300 to 1200 BCE) steppe rather than Early Bronze Age (~3000 to 2500 BCE) Yamnaya and Afanasievo admixture into South Asia. The proposal that the IE steppe ancestry arrived in the Late Bronze Age (~2300 to 1200 BCE) is also more consistent with archaeological and linguistic chronology (44, 45, 48, 49). Thus, it seems that the Yamnaya- and Afanasievo-related migrations did not have a direct genetic impact in South Asia.