The New York Times recently published an article on the Kalash, an ancient ethnic group living high in the remote mountains of Pakistan's Hindu Kush. For centuries this light-skinned, pagan people have claimed to be the long-lost descendants of Alexander the Great's world-conquering armies, which invaded this region in the fourth century B.C. The animist Kalash are outwardly different from the darker-skinned Pakistani Muslims who live in the lowlands below them, so it seemed plausible. But there had been no proof of this remarkable claim and the speculations about Greek admixture have been dismissed by Toomas Kivisild et al. (2003) stating that “some admixture models and programs that exist are not always adequate and realistic estimators of gene flow between populations ... this is particularly the case when markers are used that do not have enough restrictive power to determine the source populations ... or when there are more than two parental populations. In that case, a simplistic model using two parental populations would show a bias towards overestimating admixture”.
Moreover, Haplogroup E1b1b, which is commonly found in the Greek population (up to 35%), originated in North Africa and Greek admixture dates back to 2,300 years ago would not have resulted in the emergence of a light-skinned and mixed-race people in Hindu Kush. Haplogrpup E is the most common Y-DNA haplogroup in Africa and its frequency reaches 35% in Greece and 44% among Kosovar Albanians but it's non-exsistent in the Kalash population and extremely rare in the Indian subcontinent. It spread to Europe in the Aegean Bronze Age and until the Slavic arrivals in the Balkans, most ancient Greeks were of North African descent. The sub-Saharan African admixture rate in the modern Greek population (35%) is twice higher than that of the Iraqis (17%) but the Greeks were traditionally considered white because their cultural affinity with Europe.
India as an Incubator of Early Genetic Differentiation of Modern Humans Moving out of AfricaPhylogeographic patterns of the Y chromosome and mtDNA support the concept that the Indian subcontinent played a pivotal role in the late Pleistocene genetic differentiation of the western and eastern Eurasian gene pools. All non-Africans, including Indian populations, have inherited a subset of African mtDNA haplogroup L3 lineages, differentiated into groups M and N. Although the frequency of haplogroup M and its diversity are highest in India (Majumder 2001; Edwin et al. 2002), there is no phylogenetic evidence yet from the mtDNA coding region demonstrating that its presence in Africa is due to a back migration. Also, the lack of L3 lineages other than M and N in India and among non-African mitochondria in general (Ingman et al. 2000; Herrnstadt et al. 2002; Kivisild et al. 2002) suggests that the earliest migration(s) of modern humans already carried these two mtDNA ancestors, via a departure route over the horn of Africa (i.e., the southern route migration [Nei and Roychoudhury 1993; Quintana-Murci et al. 1999; Stringer 2000]). More specifically, the ubiquity in India of diverse branches sharing the characteristic 12705T and 16223C transitions (table 2), suggests that the N branch had already given rise to its daughter clade R, which later, in eastern Asians, differentiated into clusters B and R9 (Kivisild et al. 2002) and in western Asia gave rise to haplogroups HV, TJ, and U (Macaulay et al. 1999). The coalescence time of major M subclusters in the Indian subcontinent, which are comparable in diversity and even older than most eastern Asian and Papuan haplogroup M clusters (Forster et al. 2001), suggests that the Indian subcontinent was settled soon after the African exodus (Kivisild et al. 1999b, 2000) and that there has been no complete extinction or replacement of the initial settlers.
In a similar way, Indians show the presence of diverse lineages of the three major Eurasian Y-chromosomal haplogroups C, F, and K, although they have obviously lost the fourth potential founder, D. The presence of several subclusters of F and K (H, L, R2, and F*) that are largely restricted to the Indian subcontinent is consistent with the scenario that the coastal (southern route) migration(s) from Africa carried the ancestral Eurasian lineages first to the coast of Indian subcontinent (or that some of them originated there). Next, the reduction of this general package of three mtDNA (M, N, and R) and four Y-chromosomal (C, D, F, and K) founders to two mtDNA (N and R) and two Y-chromosomal (F and K) founders occurred during the westward migration to western Asia and Europe. After this initial settlement process, each continental region (including the Indian subcontinent) developed its region-specific branches of these founders, some of which (e.g., the western Asian HV and TJ lineages) have, via continuous or episodic low-level gene flow, reached back to India. Western Asia and Europe have thereafter received an additional wave of genes from Africa, likely via the Levantine corridor, bringing forth lineages of Y-chromosomal haplogroup E, for example (Underhill et al. 2001b), which is absent in India.
Gene Flow from Eastern AsiaAlthough both Indian and eastern Asian populations share, at the interior phylogenetic level, two major trunks of the mtDNA tree (haplogroups M and N), their subsequent branching into boughs and limbs is different (Bamshad et al. 2001; Kivisild et al. 2002): <2% of Indians, whether with tribal or caste affiliation, can trace their maternal ancestry back to eastern Asian–specific (Kivisild et al. 2002) branches (Kivisild et al. 1999a; Bamshad et al. 2001). Analogously, the subclades of the Y-chromosomal clusters C, F, and K do not overlap in southern and eastern Asia. The major continental eastern Asian clade O was virtually absent both in tribal and caste populations, although one particular O subcluster, defined by M95, has been reported in three other tribes of Andhra Pradesh (Ramana et al. 2001) and in castes and tribes of Tamil Nadu (Wells et al. 2001). The frequency of M95 is highest in Austro-Asiatic speakers, Burmese-Lolo, and the Karen of Yunnan, China (Su et al. 1999, 2000) and is virtually absent (1/984) in central Asia (Wells et al. 2001). Its irregular distribution from India to Yunnan might possibly be related to the equally uneven spread of the Austro-Asiatic speakers.
Indian RPS4Y711T chromosomes (clade C), like their Indonesian counterparts (Underhill et al. 2001a), cannot be apportioned between clusters specific to eastern Asian (/M217) and Oceanic populations (/M38, /M210). Given the high hierarchical position of the C clade in the Y binary tree, its wide distribution in the eastern hemisphere, and its high STR variability in India (fig. 3), it seems plausible that the original spread of C was associated with the southern route migration. Although haplogroup C displays idiosyncratic occurrences in Europe (Semino et al. 2000), its presence at 5% in India (perhaps its most reliable westernmost distribution) suggests that the RPS4Y mutation originated in or arrived with the earliest immigrants. Invoking back migrations to India as an explanation is unwarranted, since the absence of derivative RPS4Y lineages common in eastern Asia and Oceania suggests that these differentiations happened after RPS4Y lineages had already transited the subcontinent. Furthermore, the MX1 data distinguish the Indians from the Oceanian population, in which RPS4Y occurs frequently.
Gene Flow from Western Asia, Europe, and Central AsiaIndians virtually lack the HIV-1–protective Δccr5 allele (Majumder and Dey 2001) that is frequent in Europe, western Asia, and central Asia, implying either that this allele arose very recently in Europe or that there has not been substantial gene flow to India from the northwest. Western Eurasian–specific mtDNA haplogroups occur at low frequencies in Indian caste populations (Kivisild et al. 1999a; Bamshad et al. 2001) and are virtually absent among the tribes (Roychoudhury et al. 2001; present study). Southern and western Asian–specific U2i and U7 lineages, which are rare or absent in Europe, however, are found occasionally in the tribes. The copresence of most haplogroup U subclusters (U1–U8) in populations around the Middle East (Macaulay et al. 1999) suggests that the differentiation of haplogroup U occurred mostly west of India. If the ancestor of haplogroup U was brought to the Middle East via northern Africa by the northern route migration—a hypothesis still awaiting support from genetic data—then the presence of haplogroup U in India would be due to an early, Upper Palaeolithic migration from western Asia. Alternatively, one might consider the scenario that all western Eurasian mtDNA variation stems from the coastal southern route migration and that U had already differentiated from R in southern Asia, where it survived only in U2i (and perhaps U7) descendants. Interestingly, mtDNA haplogroup U7 (Richards et al. 2000), like Y-chromosomal clade L (Underhill et al. 2000), is also found, though at low frequencies, in western Asia and occasionally in Mediterranean Europe. The differences in STR modal haplotypes of the L clade between the Caucasus (Weale et al. 2001) and India point to their independent expansions from two distinct founder populations. Given the deep time depth of U7 (Kivisild et al. 1999a), it is possible that this east-to-west link predates the Last Glacial Maximum.
The most common Y-chromosomal lineage among Indians, R1a, also occurs away from India in populations of diverse linguistic and geographic affiliation. It is widespread in central Asian Turkic-speaking populations and in eastern European Finno-Ugric and Slavic speakers and has also been found less frequently in populations of the Caucasus and the Middle East and in Sino-Tibetan populations of northern China (Rosser et al. 2000; Underhill et al. 2000; Karafet et al. 2001; Nebel et al. 2001; Weale et al. 2001). No clear consensus yet exists about the place and time of its origins. From one side, it has been regarded as a genetic marker linked with the recent spread of Kurgan culture that supposedly originated in southern Russia/Ukraine and extended subsequently to Europe, central Asia, and India during the period 3,000–1,000 b.c. (Passarino et al. 2001; Quintana-Murci et al. 2001; Wells et al. 2001). Alternatively, an Asian source (Zerjal et al. 1999) or a deeper Palaeolithic time depth of ∼15,000 years before present for the defining M17 mutation has been suggested (Semino et al. 2000; Wells et al. 2001). Interestingly, the high frequency of the M17 mutation seems to be concentrated around the elevated terrain of central and western Asia. In central Asia, its frequency is highest (>50%) in the highlands among Tajiks, Kyrgyz, and Altais and drops down to <10% in the plains among the Turkmenians and Kazakhs (Wells et al. 2001; Zerjal et al. 2002). Our low STR diversity estimate of haplogroup R1a in central Asians is also consistent with the low diversities found by Zerjal et al. (2002) and suggests a recent founder effect or drift being the reason for the high frequency of M17 in southeastern central Asia. In Pakistan, except for the Hazara, who are supposedly recent immigrants in the region, the frequency of M17 was similarly high in the upper and lower courses of the Indus River valley (Qamar et al. 2002). The frequency of R1a drops from ∼30% in eastern provinces to <10% in the western parts of Iran (Quintana-Murci et al. 2001). Both Pakistanis and Iranians showed STR variances as high as those of Indians, when compared with the lower values in European and central Asian populations. Unexpectedly, both southern Indian tribal groups examined in this study carried M17. The presence of different STR haplotypes and the relatively high frequency of R1a in Dravidian-speaking Chenchus (26%) make M17 less likely to be the marker associated with male “Indo-Aryan” intruders in the area. Moreover, in two previous studies involving southern Indian tribal groups such as the Valmiki from Andhra Pradesh (Ramana et al. 2001) and the Kallar from Tamil and Nadu (Wells et al. 2001), the presence of M17 was also observed, suggesting that M17 is widespread in tribal southern Indians. Given the geographic spread and STR diversities of sister clades R1 and R2, the latter of which is restricted to India, Pakistan, Iran, and southern central Asia, it is possible that southern and western Asia were the source for R1 and R1a differentiation.
Compared with western Asian populations, Indians show lower STR diversities at the haplogroup J background (Quintana-Murci et al. 2001; Nebel et al. 2002) and virtually lack J*, which seems to have higher frequencies in the Middle East and East Africa (Eu10 [Nebel et al. 2001]; Ht25 [Semino et al. 2002]) and is common also in Europe (Underhill et al. 2001b). Therefore, J2 could have been introduced to northwestern India from a western Asian source relatively recently and, subsequently, after comingling in Punjab with R1a, spread to other parts of India, perhaps associated with the spread of the Neolithic and the development of the Indus Valley civilization. This spread could then have also taken with it mtDNA lineages of haplogroup U, which are more abundant in the northwest of India, and the western Eurasian lineages of haplogroups H, J, and T.
The Caste and Tribe DistinctionThe example of phylogenetic reconstruction of mtDNA haplogroup M2 showed that individuals from populations of different geographic origin and social status in India share the same branches of the tree. Similarly, since there is no grouping according to language families among the caste groups (Bamshad et al. 2001), no clusters of considerable time depth seem to be rank-specific to Indian tribal or caste groups. Phenomena like the upward social mobility of caste women (Bamshad et al. 1998) could have introduced some tribal genes to the castes more recently, but, given the relatively low proportion of the tribal population size today, recent unidirectional gene flow can be assumed to be a minor modifying force in the formation of the genetic profile of the caste population.
“Gothra” is an identity carried by male lineage in India from time immemorial. The lack of clear distinction between Indian castes and tribes was shown by Ramana et al. (2001), using a two-dimensional PC plot of Y-chromosome haplogroups. The close clustering of Chenchus with the caste groups in our MDS analysis (fig. 4) supports this finding. However, substantial heterogeneity observed in the haplogroup frequencies of the tribes and their generally lower haplotype and haplogroup diversity (e.g., the wide range in frequencies of major clades C*, J, F*, O, and R1a in tribal groups of this study) (Ramana et al. 2001; Wells et al. 2001) suggests that conclusions about Indian prehistory cannot be based on the examination of one or a few groups.
Although, on a general scale, we can argue for largely the same prehistoric genetic inheritance in Indian tribal and caste populations, this does not refute the existence of genetic footprints laid down by known historical events. This would include invasions by the Huns, Greeks, Kushans, Moghuls, Muslims, English, and others. The political influence of Seleucid and Bactrian dynastic Greeks over northwest India, for example, persisted for several centuries after the invasion of the army of Alexander the Great (Tarn 1951). However, we have not found, in Punjab or anywhere else in India, Y chromosomes with the M170 or M35 mutations that together account for >30% in Greeks and Macedonians today (Semino et al. 2000). Given the sample size of 325 Indian Y chromosomes examined, however, it can be said that the Greek homeland (or European, more generally, where these markers are spread) contribution has been 0%–3% for the total population or 0%–15% for Punjab in particular. Such broad estimates are preliminary, at best. It will take larger sample sizes, more populations, and increased molecular resolution to determine the likely modest impact of historic gene flows to India on its pre-existing large populations.
Kivisild, Toomas, et al. "
The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations." The American Journal of Human Genetics 72.2 (2003): 313-332.