Genetic Origins of the Kalash People

new

Admin
Administrator

Posts: 72,922

Genetic Origins of the Kalash People Sept 13, 2021 20:33:07 GMT

Quote

Post by Admin on Sept 13, 2021 20:33:07 GMT

Exploring European ancestry among the Kalash population: a mitogenomic perspective

Abstract
With a population of around 4 000 individuals, the Kalash people have been living in the Hindu-Kush mountain valleys of present-day northern Pakistan for centuries. Due to their mysterious origin and fairer European complexion, the genetic history of this ethnic group has been investigated previously using different markers. To date, however, the maternal genetic architecture has not been systematically dissected based on high-resolution complete mitochondrial genomes (mitogenomes), making their maternal genetic history, especially their genetic connection with Europeans from a matrilineal perspective, unclear. To unravel this issue, we analyzed mitogenome data of 34 Kalash samples together with 6 075 individuals from across Eurasia. Our results indicated exclusive western Eurasian origin of the Kalash people, represented by eight haplogroups. Among these haplogroups, J2b1a7a and R0a5a (accounting for ~50% of the Kalash gene pool) displayed in situ differentiations in the Kalash and could be traced to the Mediterranean region. Age estimations suggested these haplogroups arose in the Kalash population ~2.26 and 3.01 thousand years ago (kya), a time frame consistent with the invasion of Alexander III of Macedon to the region. One possible explanation for the maternal genetic contribution from Europeans to the Kalash people would be the involvement of women in foreign campaigns of ancient Greek warfare, followed by a founder effect. Our study thus sheds important light on the genetic origin of the Kalash community of Pakistan.

DEAR EDITOR,
With a population of around 4 000 individuals, the Kalash people have been living in the Hindu-Kush mountain valleys of present-day northern Pakistan for centuries. Due to their mysterious origin and fairer European complexion, the genetic history of this ethnic group has been investigated previously using different markers. To date, however, the maternal genetic architecture has not been systematically dissected based on high-resolution complete mitochondrial genomes (mitogenomes), making their maternal genetic history, especially their genetic connection with Europeans from a matrilineal perspective, unclear. To unravel this issue, we analyzed mitogenome data of 34 Kalash samples together with 6 075 individuals from across Eurasia. Our results indicated exclusive western Eurasian origin of the Kalash people, represented by eight haplogroups. Among these haplogroups, J2b1a7a and R0a5a (accounting for ~50% of the Kalash gene pool) displayed in situ differentiations in the Kalash and could be traced to the Mediterranean region. Age estimations suggested these haplogroups arose in the Kalash population ~2.26 and 3.01 thousand years ago (kya), a time frame consistent with the invasion of Alexander III of Macedon to the region. One possible explanation for the maternal genetic contribution from Europeans to the Kalash people would be the involvement of women in foreign campaigns of ancient Greek warfare, followed by a founder effect. Our study thus sheds important light on the genetic origin of the Kalash community of Pakistan.

The Kalash or Kalasha people are an ancient Indo-European speaking indigenous group with unique culture and traditions, living restrictively in the Hindu-Kush mountain range of present-day northern Pakistan. The enigmatic origin of the Kalash and interestingly their distinct European complexion, e.g., lighter skin tone and blue eyes, in addition to certain customs and beliefs have so far reinforced their claim to be Greek descents following the invasion of Alexander III of Macedon to the region (Cacopardo, 2011). In the past several decades, various genetic studies have been carried out to investigate the genetic structure and history of the Kalash people, in particular their genetic connection with western Eurasians. For example, several studies have indicated that this ethnic group originated from either the Middle East or Europe, followed by a population bottleneck (Qamar et al., 2002; Rosenberg et al., 2002). It is also widely concerned whether the Kalash were genetically isolated for more than 10 kya (Ayub et al., 2015) or received genetic admixture from western Eurasia during 990 and 210 BCE (Hellenthal et al., 2014). Moreover, the possible genetic connection between Greeks and the Kalash remains controversial (Cacopardo, 2011; Firasat et al., 2007; Mansoor et al., 2004; Qamar et al., 2002).

Many previous genetic studies have been based on nuclear genome or Y chromosome data, while the maternal genetic structure of the Kalash had only been dissected based on mitochondrial DNA (mtDNA) restricted fragment length polymorphism (RFLP) and control region variations (Quintana-Murci et al., 2004), thus greatly limiting our understanding of the maternal genetic landscape of this ethnic group. Therefore, whether there is a substantial maternal genetic contribution from Europeans to the Kalash, and when this genetic contact was established, remain unclear.

To provide more insight into the genetic history of the Kalash from a matrilineal perspective, we collected and analyzed available complete mitochondrial genome (mitogenome) data of 34 Kalash individuals (25 from the CEPH Human Genome Diversity Project (HGDP) panel (Cann et al., 2002) and nine from this work), as well as 6 075 individuals sampled from Europe and Asia (Figure 1A; Supplementary Table S1). As showed in our results, a total of eight mtDNA haplogroups were identified in the Kalash, including R0a, U4a1, J2b1a, U2e1h, H2a1a, U4b1a4, T2a1a, and U2e2a1, all of which exclusively arise from the Eurasian macro haplogroup R, an observation in agreement with previous study (Quintana-Murci et al., 2004). Comparison of the maternal composition between Kalash and other Eurasian populations (Supplementary Table S1) showed that most of the identified haplogroups in the Kalash were substantially shared with neighboring Dardic group (Kho), as well as being ubiquitous in other western Eurasians (Figure 1B), indicating a western origination of this ethnic group. This is consistent with previous studies that were based on both uniparental markers and whole-genome data (Hellenthal et al., 2014; Qamar et al., 2002; Quintana-Murci et al., 2004). Phylogeographic analysis based on all available complete mitogenomes retrieved from the online platform MitoTool (http://mitotool.kiz.ac.cn/) (Fan & Yao, 2011) as well as from published literature further suggested that most haplogroups identified in Kalash, like R0a, U2e1h, U4a1, H2a1a, T2a1a, and U2e2a1, had sub-branches (e.g., R0a5a, U2e1h1, U4a1f, H2a1a3, etc.) distributed restrictively in northern Pakistan and shared by the Kalash and other Indo-European-speaking populations in the area (Supplementary Figure S1; Supplementary Table S2). Interestingly, the Kalash individuals distributed sporadically in the terminal positions of the sub-branches, strongly suggesting traces of recent gene flow from other groups into the Kalash (Supplementary Figure S1). Moreover, these haplogroups also showed prevalence in the Mediterranean region (e.g., U2e2a1, J2b1a1, and R0a) or in Eurasian Steppe (e.g., H2a1a, T2a1a, U2e1h, U4a1, and U4b1a4), thus possibly reached the Hindu-Kush region in different periods and further introgressed into the Kalash by recent gene flow.

Admin
Administrator

Posts: 72,922

Genetic Origins of the Kalash People Sept 14, 2021 2:17:31 GMT

Quote

Post by Admin on Sept 14, 2021 2:17:31 GMT

Figure 1
Sample locations, distribution of haplogroups identified in Kalash people, and phylogeographic structure of haplogroup J2b1a
A: Geographic locations of populations with complete mitogenome sequences in Pakistan and surrounding countries are shown in the inset. Number of mitogenomes available from each region is proportional to color intensity in respective regions defined in figure legends (comprising 6 075 complete mitogenome sequences; see Supplementary Table S1). B: Schematic tree showing eight west Eurasian haplogroups identified in Kalash and their frequency in other local and west Eurasian populations. C: Phylogeographic reconstruction using median-joining network for haplogroup J2b1a from complete mitogenomes (comprising 106 complete mitogenome sequences belonging to haplogroup J2b1a and five ancient mitogenomes belonging to J2b1; see Supplementary Table S3). Each circle represents one individual sample, unless represented by a number in the circle. Dotted line shows case in which J2b1a7a and J2b1a7b emanate from root of J2b1a independently, with position 16274 (in italics) serving as a parallel mutation on both branches. Mutated positions are shown on branches with different colors for each type of mutation, as seen in legend. Specific clade shared between Sardinians and Kalash is enclosed in red circle; Kalash and Pashtun samples are shown in italics on different branches of node. Geographic affiliations of samples are shown in different colors, as defined in legends. Red circles represent ancient mitogenomes included in network construction. R or Y indicate heteroplasmic states. @ represents reverse mutation, < represents parallel mutation on branches.

Different from the above lineages in which the Kalash samples distributed sporadically in different branches, haplogroup J2b1a had a sub-branch (defined by a non-synonymous transition at position 11204 and tentatively named as J2b1a7a) occupied by six Kalash and two Pashtun individuals, a neighboring group previously shown to have had a limited European connection based on Y chromosome study (Firasat et al., 2007). Further phylogeographic analysis showed that the root types of J2b1a7a were predominantly found in Kalash, whereas a Pashtun individual positioned in one terminal branch, indicating an in-situ differentiation of this lineage in the region and further spread into the Pashtuns. Importantly, J2b1a7a shared substitution 16274 with its sister haplogroup (defined by substitutions 15319 and 16213 and tentatively named as J2b1a7b) from Europe (nine Sardinians) (Figure 1C; Supplementary Table S3), indicating a close genetic connection between the Kalash and Europeans. Together with the relatively high proportion of J2b1a7a in the Kalash samples (17.6%), this haplogroup sheds important light on the European ancestry of this ethnic group.

Moreover, considering that the shared position 16274 between the Kalash and Sardinians is hypervariable, it is also probable that the two lineages J2b1a7a and J2b1a7b were derived from the root of J2b1a independently, with 16274 serving as a parallel mutation on both branches. We therefore turned our attention to the ancestral node, J2b1a. Coincidently, the majority (74%) of J2b1a samples, as well as its ancient root type J2b1, were found in Europe, especially in Sardinia (Figure 1C; Supplementary Table S3). This evidence therefore implies an origination of J2b1a in Europe (probably around the Mediterranean region), in agreement with previous study (Pala et al., 2012). Additional support comes from the observation of haplogroup J2b1a in bones of ancient Europeans (Figure 1C, Supplementary Table S3). Further age estimations using mitogenome rate (Soares et al., 2009) revealed that the major haplogroup J2b1a can be traced back to 10.59±1.28 kya, a timeframe within the Neolithization and Bronze Age processes in the Mediterranean region (Marcus et al., 2020), with the Kalash branch (J2b1a7a) 2.26±1.44 kya reflecting a recent split from its European counterpart, followed by independent differentiation in the Hindu-Kush region. Similarly, haplogroup R0a5a, with root types found around the Mediterranean region and a coalescent age of ~3.01±1.5 kya in the Kalash, would also have been introduced into the Kalash gene pool during these recent times. Taken together, about ~50% of the Kalash maternal genetic components were derived from haplogroups J2b1a7a and R0a5a, thus documenting recent genetic introgression (likely from the ancestors of modern Sardinians) to the Kalash, around the time when migration to Sardinia was active from the northern and eastern Mediterranean regions (starting ~1 000 BCE) (Fernandes et al., 2020).

Interestingly, this genetic connection echoes well with the close genetic affinity found between Sardinians and Kalash from studies based on eye-color informative single nucleotide polymorphisms (SNPs) (Walsh et al., 2011), thus probably underlying the similarities in physical features, e.g., lighter complexion of Kalash and Europeans. Moreover, given that the age of J2b1a7a fell within the Macedonian advancement towards northern Pakistan (327 BCE) (Olivieri et al., 2019), and the existence of J2b1c, J2b1a1, and J2b1a3 (sister and sub-type lineages of J2b1a) in ancient and modern Greeks (Lazaridis et al., 2017; Pala et al., 2012), including evidence of eastern Mediterranean immigrants in South Asia (Harney et al., 2019), it is also probable that this genetic connection was mediated by the Greeks. In fact, according to historical records, limited females participated in foreign campaigns of ancient Greek warfare (Loman, 2004), making it likely that the females also took part in this occupation, thus contributing to the Kalash gene pool. This scenario is further supported by evidence of human mobility towards mainland Greece and islands like Sardinia, especially from the Mediterranean, via both sea and land routes during the Mesolithic and even more recent times (Demand, 2012; Fernandes et al., 2020; Marcus et al., 2020). However, the absence of J2b1a in other regions that had been occupied by Alexander’s ancient empire (especially Greece), as well as its prevalence in Sardinian and Kalash people, should not be ignored. One probable explanation would be limited female migration along with Alexander’s siege into other regions, or genetic dilution by later demographic events. Additionally, genetic isolation, followed by bottlenecks in both Sardinians (Di Gaetano et al., 2014) and Kalash (Ayub et al., 2015), further played likely roles in the increase of this lineage in these two regions. Moreover, the limited number of reported mitogenome sequences available from Greece so far could also result in this observation. More studies will be carried out to explain whether this maternal genetic connection between the Kalash and Sardinians was mediated by Greek expansion.

In summary, our analysis observed a genetic ancestry from Europe (probably around the Mediterranean) within the Kalash people from about 3.01±1.5 and 2.26±1.4 kya. This recent genetic contribution from Europe, as revealed in this study, accounts for a significant proportion (~50%) of the Kalash, thus playing an important role in the formation of the maternal gene pool of this ethnic group. Thus, our study sheds important light on the genetic history of the Kalash people of northern Pakistan.

Admin
Administrator

Posts: 72,922

Genetic Origins of the Kalash People Sept 14, 2021 21:26:07 GMT

Quote

Post by Admin on Sept 14, 2021 21:26:07 GMT

A genetic atlas of human admixture history

Published in final edited form as:
Science. 2014 Feb 14; 343(6172): 747–751.
doi: 10.1126/science.1243518

Abstract
Modern genetic data combined with appropriate statistical methods have the potential to contribute substantially to our understanding of human history. We have developed an approach that exploits the genomic structure of admixed populations to date and characterize historical mixture events at fine scales. We used this to produce an atlas of worldwide human admixture history, constructed using genetic data alone and encompassing over 100 events occurring over the past 4,000 years. We identify events whose dates and participants suggest they describe genetic impacts of the Mongol Empire, Arab slave trade, Bantu expansion, first millennium CE migrations in eastern Europe, and European colonialism, as well as unrecorded events, revealing admixture to be an almost universal force shaping human populations.

Diverse historical, archaeological, anthropological and linguistic sources of information indicate that human populations have interacted throughout history, due to the rise and fall of empires, invasions, migrations, slavery, and trade. These interactions can result in sudden or gradual transfers of genetic material, creating admixed populations. However, the genetic legacy of these interactions remains unknown in most cases, and the historical record is incomplete. We have developed an approach that provides a detailed characterization of the mixture events in the ancestry of sampled populations, based on genetic data alone.

Admixed populations should have segments of DNA from all contributing source groups (Fig. 1A), whose size decreases over successive generations due to recombination, and approaches have been developed to date admixture events by inferring the size of ancestry segments (1-5). Between-population frequency differences of individual alleles may provide information on ancestry sources (6, 7). Based on these principles we developed an integrated approach, using genome-wide patterns of ancestry to infer jointly both fine-scale information about groups involved in admixture, and its timing, allowing for the fact that migration and admixture events can occur at multiple times or involve numerous groups.

Last Edit: Sept 14, 2021 21:26:44 GMT by Admin

Admin
Administrator

Posts: 72,922

Genetic Origins of the Kalash People Sept 15, 2021 0:47:46 GMT

Quote

Post by Admin on Sept 15, 2021 0:47:46 GMT

Fig. 1
Ancestry painting and admixture analysis of simulated admixture
(A) We illustrate a simulated event 30 generations ago between Brahui (80%, red) and Yoruba (20%, yellow), resulting in admixed individuals having DNA segments from each source (bottom). The true sources are then treated as unsampled (B) CHROMOPAINTER’s painting of the same region (yellow=Africa, green=America, red=Central-South-Asia, blue=East-Asia, cyan=Europe, pink=Near-East, black=Oceania), showing haplotypic segments (“chunks”) shared with these groups. Our model fitting narrows the donor set largely to Central-South Asia and Africa, generating a “cleaned” painting. (C) Coancestry curves (black line) show relative probability of jointly copying two chunks from red (Balochi; FST =0.003 with Brahui) and/or yellow (Mandenka; FST =0.009 with Yoruba) donors, at varying genetic distances. The curves closely fit an exponential decay (green line) with rate 30 generations (95% CI: 27-33). The positive slope for the Balochi – Mandenka curve (middle) implies these donors represent different admixing sources. (D) GLOBETROTTER’s source inference, with black diamonds indicating sampled populations with greatest similarity (FST≤0.001 over minimum) to true sources, white circles other sampled populations. Red and yellow circles, with areas summing to 20% and 80%, respectively, show inferred haplotypic make-up of the two admixing sources.

Our approach gains power and resolution by using alleles at multiple successive SNPs (haplotypes) (8). Given a focal population within a larger dataset containing many such groups, the chromosomes of individuals in this population share ancestors with those in other populations, resulting in shared “chunks” of DNA. We used CHROMOPAINTER (8) to decompose each chromosome as a series of haplotypic chunks, each inferred to be shared with an individual from one of the other groups, and colored (or painted) by this group (Fig. 1B). If the focal population is admixed, the changing colors along a chromosome noisily reflect true, but unknown, underlying ancestry (Fig. 1B), and so can be used to learn details of the source group(s) involved. To do this, we model haplotypes within each unsampled source group as being found across a weighted mixture of sampled “donor” populations (9). If a source group is genetically relatively similar to a single sampled population, then this population will dominate the inferred mixture. If there is no close proxy for the admixing group in the sample, especially likely for ancient admixture events or sparsely sampled regions, several donor populations will be needed to approximate its pattern of haplotype sharing. The focal population is then automatically a haplotypic mixture of the combined donors, because it is a mixture of the source groups. Inferring the reduced set of groups within the mixture allows us to produce a “cleaned” painting (Fig. 1B) using only these groups.

To assess the evidence for admixture and date events, informally we measure the scale at which the “cleaned” painting changes along the genome. Specifically, we produce a “coancestry curve” for each pair of donor populations, which plots genetic distance x against a measure of how often a pair of haplotype chunks, separated by distance x, come from each respective population (Fig. 1C), analogously to ROLLOFF curves (4), and averaging over uncertain, and typically computationally estimated, haplotypic phase (9). In theory, given a single admixture event, ancestry chunks inherited from each source have an exponential size distribution, resulting in an exponential decay of these coancestry curves (9). The rate of decay in all curves will be equal to the time in generations since admixture (Fig. 1C) (4, 9, 10), allowing estimation of this date: steeper decay corresponds to older admixture. Such a decay distinguishes true admixture from ancient spatial structure, and should only occur in recipient, but not donor, groups involved in non-reciprocal admixture events. We test for evidence to reject (p<0.01) a no-admixture null model, i.e. no exponential decay in (normalized) coancestry curves, via bootstrapping (9). Multiple admixture times result in a mixture of exponentials (9); so if admixture is detected, we test for evidence of multiple admixture times (e.g. two episodes of admixture, or more continuous admixture over a longer period; empirical p<0.05 in simulations), comparing the fit of a single exponential decay rate versus a mixture of rates.

The curve heights (intercepts) provide complementary information to deconvolve the number and genetic composition of the ancestral sources prior to admixture (11). Fitted curves for all pairs of donor groups (Fig. 1C shows three examples) specify a pairwise intercept matrix which, following normalization, we decompose using a series of eigenvectors. Analogous to the standard use of eigenvector decomposition in principal components analysis (PCA) in genetics to estimate relative ancestry source contributions for different individuals (12), the eigenvectors allow us to estimate the relative contribution to different admixing sources (e.g. source 1 vs. source 2) for each different donor group (9). Also as for PCA, admixture between K distinct source populations produces K-1 significant eigenvectors (13), and we test for three or more admixing sources by testing (empirically) for evidence of two or more such eigenvectors (p<0.05) (9). Following iterative modeling to improve results, this allows us to attempt to “reverse” the admixture process (Fig. 1D) and infer the haplotypic makeup of admixing source groups as well as admixture date(s), in our method GLOBETROTTER.

To test our approach under diverse single, complex and no-admixture scenarios, incorporating many of the complexities – such as unsampled or admixed donor groups – likely to be present in real data, we simulated admixture scenarios involving real (but hidden to our analysis) human populations (4, 9) and populations generated under a coalescent framework (14) incorporating inferred (15-18) past demographic events. Admixture was simulated between 7 and 160 generations (200-4,400 years, assuming 28 years per human generation (19)) ago, with admixture fractions 3%-50%, and genetic differentiation (FST) between the admixing groups varying from 0.018 (similar to Europe vs Central Asia) to 0.185 (similar to West Africa vs. Europe). Results are detailed online (Fig. S3-7; Tables S1, S5). All populations simulated without admixture, including those with long-term migration, showed no admixture evidence (p>0.1). Power to detect admixture (p<0.01) when present was 94%, and 95% of our 95% bootstrapped confidence intervals contained the true admixture date, including cases with two distinct incidents of admixture or multiple groups admixing simultaneously. Inferred source accuracy was very high (9), with e.g. the mixture representation predicting a haplotype composition more correlated to the true, typically unsampled, source population than to any single sampled population >80% of the time. However source accuracy was lower for admixing sources contributing only 5% of DNA, with around 40% of such scenarios yielding elevated (>25%) rates of falsely inferring multiple admixture times and/or admixing groups. Further testing demonstrated robustness of GLOBETROTTER, in simulations and real data, to haplotypic phase inference approach used, inclusion/exclusion of particular chromosomes, genetic map chosen to provide genetic distances, and the presence of population bottlenecks since admixture, while GLOBETROTTER admixture dating was improved relative to ROLLOFF (4, 9).

Nevertheless, there are a number of settings which we believe are challenging for our approach. First, although the admixing sources need not be sampled – often impossible due to genetic drift, extinction, or later admixture into the sources themselves - source inference is improved when more similar extant groups are sampled, and GLOBETROTTER may miss events where we lack any extant group that can separate sources. Second, sampling of several genetically very similar groups can mask admixture events they share. Similarly, a caveat is that where genuine, recent bidirectional gene flow has occurred, admixture fractions are difficult to define and interpret. However, date estimation is predicted to still be useful, and in real data the majority of our inferred events do not appear to be bidirectional in this manner. Third, even in theory our approach finds it challenging to distinguish distinct continuous “pulses” of admixture and continuous migration over some timeframe (9), due to the difficulty of separating exponential mixtures (20). If the time frame were narrow, we expect to infer a single admixture time, within the range of migration dates. Where we infer two admixture dates, in particular with the same source groups, the exponential decay signal could also be consistent with more continuous migration, and so we conservatively refer to this as admixture at “multiple dates”. Finally, we only attempt to analyze populations with signals consistent with at most 3 groups admixing, and infer at most two admixture times, and we can provide only less precise inference of sources for the weaker/older admixture signal in these complex cases (9).

Admin
Administrator

Posts: 72,922

Genetic Origins of the Kalash People Sept 16, 2021 0:11:32 GMT

Quote

Post by Admin on Sept 16, 2021 0:11:32 GMT

Using GLOBETROTTER, we analyzed 1,490 individuals from 95 worldwide human groups (Table S6, Fig. S11) (9), comprising 17 newly genotyped groups (21), 53 from the Human Genome Diversity Panel (HGDP) (22) and 25 from other sources (23, 24), filtered to 474,491 autosomal SNPs. We phased the individuals using IMPUTE2 (9, 25) and used fineSTRUCTURE (8) to verify homogeneity within labeled populations, identify genetically similar and clustered groups, and to remove outlying individuals (Figs. S12-S14; Tables S10-11). Of the 95 populations, 80 showed evidence (p<0.01) of admixture, although nine could not be characterized by our approach (Table S8). More than half of these have evidence of multiple waves of admixture (p<0.05), and estimated admixture times vary from <10 generations, to >150 generations (Fig. 2). We present individual results, for each population, via an interactive map online (26). We tested consistency of our results against a previous analysis of the 53 groups within the HGDP (11), which identified 34 groups with evidence of recent admixture. We identify (p<0.01) admixture evidence in all 34 cases (with multiple event evidence in 15 cases), and obtain 95% admixture date CIs narrower than, but consistent with, those estimated using ROLLOFF (9, 11). For 10 of 19 HGDP groups lacking previous support for recent admixture, GLOBETROTTER also identifies no events: in the remaining populations admixture is inferred as occurring between genetically similar sources (FST<0.02), a challenging setting where simulations suggest our method is more powerful (9).

Fig. 2
Overview of inferred admixture for 95 human populations
(A) Coancestry curve for the Maya for Spanish donor group (inferred as closest to minor admixing source), with green fitted line showing inferred exponential decay curve and a corresponding recent admixture date (with 95% CI). (B, C) As A, but showing the Druze and Kalash respectively, with different indicated donors (donors indicated are proxies for minor admixing source, inferred as closest to Yoruba and Germany/Austria, respectively) and with successively older admixture. (D) On the map (locations approximate in densely sampled regions), shapes (see legend) indicate inference: no admixture, a single admixture event, or more complex admixture; colors indicate fineSTRUCTURE clustering into 18 clades (Table S11, Figs. S12-S13). Inferred date(s), 95% CIs are directly below the map, with two inferred admixing sources (dots and vertical bars) shown below each date (see example for simulation of Fig. 1 at left). For multiple admixture times, these two sources correspond to the more recent event; for multiple groups, they reflect the strongest admixture “direction”. Colored dots above each bar indicate clades best representing the major (top) and minor (bottom) sources. The bar is split at the inferred admixture fraction (horizontal line, fractions < 5% shown as 5%). Each bar section indicates inferred donor group haplotypic make-up, colored as the map, for one source. Shaded boxes on the inferred admixture times denote events referred to in the text, specifically 1. European colonization of the Americas (1492CE-present; hot pink), 2. Slavic (500-900CE; pink), Turkic (500-1100CE; maroon) migrations, 3. Arab slave trade (650-1900CE; cyan), 4. Mongol Empire (1206-1368CE; purple), and 5. Khmer Empire (802-1431CE; orange).

In several instances, GLOBETROTTER clarifies or extends previous genetic analyses. For example, a previous study (27) inferred admixture in the Maya, with best source populations the Mozabites from North Africa and the Native American Surui, speculating based on historical events that this might actually represent a mixture of European, West African and Native American ancestry sources. GLOBETROTTER inferred admixture between three groups in the Maya dating to around 1670CE (9 generations ago) (28) (Figure 2A; Fig. 2D, hot pink box 1), with distinct sources from Europe (most genetically similar to the Spanish), West Africa (the Yoruba) and the Americas (the Pima, the nearest sampled group in the Americas). A different method – which aims to detect, but not date, admixture - concluded that Cambodians trace ~16% of their DNA to a group equally related to modern-day Europeans and East Asians (29). GLOBETROTTER infers a ~19% contribution from a similar source - related to modern-day Central, South and East Asians - and an ~81% contribution from a source related specifically to modern-day Han and Dai, the latter a branch of the Tai people who entered the region in historical times (30) (Fig. 2D, orange box 5). Further, this event dates to 1362CE (1194-1502CE), a period spanning the end of the Indianized Khmer empire (802-1431CE) (30), one of the most powerful empires in Southeast Asia whose fall was hypothesized to relate to a Tai influx (30).

A comparison with the historical record becomes progressively more difficult for older episodes. Even when events are well attested their exact genetic impacts (if any) are rarely if ever known, motivating our approach. Nevertheless, we have identified nine groups of populations showing related events, incorporating almost all (19/20) with the strongest GLOBETROTTER admixture evidence (9). Results are presented as online maps (26). Some events appear to match well with particular historical occurrences such as the Bantu Expansion into Southern Africa (9). Events affecting a group of seven populations (Fig. 2D, purple box 4) correspond in time to the rapid expansion led by Genghis Khan and the subsequent Mongol empire (1206-1368CE) (31), one of the most dramatic events in human history. These populations, including the Hazara (32, 33), Uygur (34) and the Mongola themselves, were sampled from within the range of the Mongol empire and show an admixture event dating within the Mongol Period, with one source closely genetically related to the Mongola that progressively decreases in proportion westward, to 8% in the Turkish (Fig. 2D).

Seventeen populations from the Mediterranean, Near East and countries bordering the Arabian Sea (Fig. 2D, blue box 3) show signals of admixture from sub-Saharan Africa, with most recent dates in the range 890-1754CE (Fig. 2B,D). We interpret these signals, consistent with overlapping results of previous studies (4, 20) as resulting from the Arab expansion and slave trade, which originated around the 7th century (35). Our event dates are highly consistent with this, but also imply earlier sub-Saharan African gene flow into e.g. the Moroccans. The highest-contributing sub-Saharan donor is West African for all 12 Mediterranean populations, and an East or South African Bantu-speaking group for all 5 Arabian Sea populations (Fig. 2D), confirming genetically different sources for these slave trades (35).

A population group centered around Eastern Europe shows signals of complex admixture. FineSTRUCTURE did not fully separate groups from this region, suggesting “masked” shared events might be present. We therefore repainted them excluding each other as donors: we performed similar re-analyses of five additional geographic regions for the same reason (Table S16; Figs. S16-21). The easterly Russians and Chuvash both show evidence (p<0.05) of admixture at more than one time (Fig. 2D), at least partially predating the Mongol empire, between groups with ancestry related to Northeast Asians (e.g. the Oroqen, Mongola and Yakut) and Europeans, respectively (Table S16). Six other European populations (Fig. 2D, pink/maroon box 2) independently show evidence following the repainting for similar admixture events involving more than two groups (p<0.02) at approximately the same time (Fig. 3). CIs for the admixture time(s) overlap, but predate the Mongol empire, with estimates from 440-1080CE (Fig. 3). In each population, one source group has at least some ancestry related to Northeast Asians with ~2-4% of these groups’ total ancestry linking directly to East Asia. This signal might correspond to a small genetic legacy from invasions of peoples from the Asian steppes (e.g. the Huns, Magyar and Bulgars) during the first millennium CE (36). The other two source groups appear much more local. One is more North-European in the repainting - when we exclude other East European groups as donors - and is largely replaced by northern Slavic-speaking groups in our original analysis (Fig. 2D; Table S12). The other source is more southerly (e.g. Greece, West Asians). This local migration could explain a recent observation of an excess of IBD sharing in Eastern Europe – including in the Greeks, in whom we infer admixture involving a group represented by Poland, at the same time -that was dated to a wide range between 1,000 and 2,000 years ago (37). We speculate that these events may correspond to the Slavic expansion across this region at a similar time, perhaps related to displacement caused by the Eurasian steppe invaders (38).