For years the residents of the remote north western Chinese village of Liqian have believed they were special. Many of the villagers have Western characteristics including green eyes and blonde hair leading some experts to suggest that they may be the descendants of a lost Roman legion that settled in the area. Now DNA testing of the villagers has shown that almost two thirds of them are of Caucasian origin.
In 53BC, after Crassus was defeated by the Parthians and beheaded near what is now Iran, stories persisted that 145 Romans were captured and wandered the region for years. The town's link with Rome was first suggested by a professor of Chinese history at Oxford in the 1950s.
Oxford professor Homer Dubs believes the group travelled east, were captured by the Chinese and founded Liqian in 36BC. Prof Dubs theorised that they made their way as a mercenary troop eastwards, which was how a troop 'with a fish-scale formation' came to be captured by the Chinese 17 years later.
It has been suggested that some made their way east to today's Uzbekistan and later enlisted with the Hun chieftain Jzh Jzh against the Chinese Han Dynasty. He said the 'fish-scale formation' was a reference to the Roman 'tortoise', a phalanx protected by shields on all sides and from above.
Yang Gongle, professor with Beijing Normal University, said there has not been sufficient proof to link the villagers with the ancient Romans. According to Yang's research, Liqian County was established in 104 BC, half a century earlier than the proposed arrival of the Roman soldiers. And he noted that the fish-scale formation had nothing to do with Roman legion's famous 'testudo' strategy.
Liqian people, officially recognized as Han Chinese by P. R. China, live in some small village located in Yongchang County of Gansu province, China. Many of them have light colored hair and Caucasian features, which are sharply different from Han Chinese and most ethnic minorities. Recent years, Liqian people were well known to all with controversial hypothesis of ancient Roman mercenary origin. In 1955, Homer H. Dubs proposed that some Roman soldiers captured by the Parthians after Crassus’s defeat at Carrhae in 53 B.C. were eventually hired as mercenaries by a Hun warlord in the western frontier past the boundaries of the Han Empire and were captured by the Chinese and allowed to form their own city, based on the Roman model.
The hypothesis has been adopted by some scholars. It was, however, disputed by many historians. Several decades passed, the hypothesis remains hotly debated. No direct evidence, paternal genetic contribution seems particular necessary. To test this hypothesis, we surveyed more than 12 Y chromosome binary polymorphisms by use of PCR （polymerase chain reaction, PCR-RFLP （restriction fragment length polymorphism）, DHPLC （Denaturing High Performance Liquid Chromatography） methods and 12 short tandem repeat （Y-STRs）loci by using Powerplex? Y system for 227 male individuals representing four Chinese populations: Liqians, Yugurs, Uygur and Tibetans. In comparison with worldwide populations, the following results were obtained.1.
Eleven Y-SNP haplogroup and 75 Y-STR haplotype were observed for 87 unrelated Liqian males. At the haplogroup level, the Liqians presented low genetic diversity with a single highest frequent haplogroup O3-M122 （71.3%）. When 12 fast-evolving Y-STRs were used, the genetic diversity of Liqians is high than 0.98. In present study, 77% Liqian Y chromosomes were restricted to East Asia. It is unexpected that the frequency of Haplogroup O-M175, an East Asian-specific haplogoup, is relatively higher in Liqian people than that in most populations in North China.2. Principal Component （PC） based on Y-SNPs and multidimensional scaling （MDS） analysis on basis of Y-STRs suggests that the Liqians is closely related to Chinese populations, especially Han Chinese populations, whereas greatly deviate from Central Asian and West Eurasian populations. The positions of populations within some clusters correspond well to their predefined assignments to specific regional groups.
One conclusion can also be drawn from these analyses: despite the ascertainment bias in the binary markers, PC result based on Y-SNPs is consistent with MDS result on basis of Y-STRs. In addition, further phylogenetic analysis confirmed the genetic affinity between Liqian and Han Chinese populations.3. By PC and MDS analysis, we found two old populations: Han Chinese and Mongolians, which showed close genetic relationship to Liqians. In subsequent admixture analysis, the two populations were assumed to be parental populations of Liqians, and the Liqians is regarded as hybride population. Admixture proportional analysis suggested that the genetic contribution from Han Chinese amount to 78% in Liqians.4. The Liqians and the Yugurs, regarded as kindred populations with common origins, present underlying genetic difference in Median-joining network and admixture proportional analysis.5.
Liqian population show close affinities to its geographic neighbors. This is confirmed in Mantel test （r=0.646, P =0.003）, which show a strong and highly significant partial correlation between genetics and geography among population mentioned in our study.6. Statistically, the Liqian showed non-significant genetic difference to Han Chinese in North China, and significant genetic difference to other Eurasian populations.7. When we compare Liqian minimal haplotypes （9 loci "minimal haplotype"） with worldwide data in YHRD, most Liqian haplotypes were found in East Asia and South Asia. Only two matches were found in Europe, but they belonged to East Asia-specific Haplogroup O-M122. The incompatible result probably originated from recurrent mutation of fast-evloving Y-STRs. Overall, Roman mercenary origin could not be accepted as a history truth according to paternal genetic variation, and the current Liqian population is more likely to be a subgroup of Chinese majority Han. Our studies provided genetic evidence for the origin of the Liqian people, and inriched to human genetic database. The 12 Y-STR polymorphism markers are highly discriminating in the Chinese Liqian population, and they may be powerful for paternity testing and personal identification.
In August 2013, Andronovo culture remains, including 17 tombs and 3 ritual sites, were discovered in Xinjiang Uyghur Autonomous Region, which shows that the Uyghurs are culturally and genetically linked to the Andronovo culture. The Andronovo people belonged to R1a/Z93 mixed with Q1a2a1/L54 and the R1a frequency in the Uyghurs amounts to 28.6%. On the other hand, R1a1 accounts for only 1.1% in Liquian villagers in western China and 71.3% of them belong to O3 (Zhou et al. 2007). Other West Eurasian haplogroups such as P*(xR1a) and J* were found at 8% and 1.1% respectively. The frequency of P*(xR1a) varied from 26% in the east to 45% in the south of Norway. Up to 10% of the Liquian villagers,which explains why they are born with Caucasian physical features, and they may also have the steppe ancestry linking them to the Andronovo people from Russia.
The first extensive analysis of Y variation in the Liqians was carried out in our study. This allowed us to compare our data with those previously reported in worldwide populations in order to investigate the origin and evolution。 Internal numbers are bootstrap values (%) for 1,000 replicates. Tunisian was regarded as outgroup of the Liqian people. The results based on Y-SNPs are consistent with those obtained with Y-STRs, which rein-force our findings. In the present study, it was unexpected that 71.3% Ychromosomes of the Liqians belonged to the haplogroupO3-M122, which is an East Asian-specific haplogroup (Suet al.1999; Shi et al.2005). The O3-M122 frequency in the Liqians was by far the highest one observed among pop-ulations in northern China (Xue et al.2006; Su et al.1999,2000b; Shi et al.2005). Since the Han dynasty, the Liqians have been living in northwestern China, which has been well-established by historical records (Song et al.2005).Combining with the historical records, the haplogroup distribution suggested that most of the Liqian Y chromo-somes could be traced back to northern China. The result was also reflected in PC and MDS analysis, which indi-cated that the Liqians were genetically close to Chinese populations, especially Han Chinese populations in various regions. The result was confirmed in admixture and phy-logenetic analysis (Table2, Fig.5), suggesting a strong Han Chinese paternal influence on the Liqian gene pool.The Mantel test suggests that the Liqian people, wherever they originated, must have had an extensive gene exchange with the local people.
Besides the Han Chinese people,Mongolians and Yugurs in China are genetically related to the Liqian people.Previous studies demonstrated that the male lineages of the Mongolians spread rapidly in a large part of Asia (Zerjal et al. 2002, 2003). The Hexi region in northern China, where the Liqians have settled, was controlled by the Mongols in the 13th and 14th centuries. It is reasonable that the Mongolians would have an impact on the Liqiangene pool. In our study, a small Mongolian contribution was observed in the Liqian gene pool.The Yugurs, described as a kindred population of the Liqians, were genetically close to the Liqians in PC and MDS plots, which is compatible with the admixture analysis. The paternal genetic contribution indicated that the Liqian and the Yugur populations have similar con-tribution proportions from the Han Chinese and Mongo-lian populations.
Nevertheless, the underlying genetic difference between the Liqian and Yugur populations was explored in a median-joining network and neighbor-join-ing tree (Fig.5, Fig.6). It may result from long-term isolation by distance. In addition, the Yugurs are strictly endogamous and live in autonomous regions of the central province of Gansu, while the Liqians live together withHan Chinese. Overall, the genetic difference between the Liqian and Yugur populations is statistically non-signifi-cant.In PC and MDS analysis, the Liqian population is fairly distinguished from Central Asian and West Eurasian pop-ulations. The result is incompatible with the historical hypothesis that the Liqian people derived from ancient Roman soldiers, which probably included mercenaries from West Asia, as described by Huang et al.1990. When Liqian haplotypes were compared with worldwide popu-lations in the YHRD, no matches were found only in the West Eurasian populations, and only two matches were shared by European populations and East Asian popula-tions. The two Liqian haplotypes are present in West European populations, but they belong to East Asian-spe-cific haplogroup O3. The incompatible fact probably arose from the high mutation rate of Y-STR.
It seemed that the two Y chromosomes are more likely to be Asian lineagesthan European lineages. Failure to find an apparent link between the Liqian people and ancient Roman soldiers in this study might be either because long distance migrationand intermarriage have erased earlier genetic signatures orbecause the Liqians are just a general population in north China. Moreover, it is noteworthy that a small proportion of Liqian people with mixed racial traits are not necessarily associated with ancient Roman soldiers. Along the ancient Silk Road in north China, it is common to see people with Caucasian morphological traits, which is also a classical trait of Chinese minority ethnic groups in Xinjiang (like theUygurs). Therefore, we cannot trace a Liqian origin only from morphological traits.As described above, the Liqians are closely related to Chinese populations, especially the Han Chinese in north China. In addition, the Liqian and Yugur populations arelikely to be kindred populations. No obvious signature of Roman soldier origin is observed in the Liqian paternal gene pool. A Roman mercenary origin for the Liqian people is likely to be nothing more than an interesting theory. In order to reveal genetic landscapes of the Liqians completely, complementary autosomal and mtDNA studies have to be carried out in future work.
These Koreans who are naturally born with light eyes may be endowed with the R1a haplogroup commonly found in Russia. Figure 2C shows that the overall frequency of R1a and P*(xR1a) is 2-3% in Korea and there are genetic ties between the Uyghurs in western China, who are known for Caucasian looks, and ethnic Koreans. Moreover, the origins of the Yemaek tribes in Korea may be traced back to the Altai region, where Russia, China and Mongolia come together, and the ancient Korean tribes weren't indigenous to the Korean peninsula. The Songhua River basin in northern China was once a home to the Yemaek tribes, where they interacted with the nomadic steppe herders, who developed a sophisticated kurgan culture.
13% of Mongolians also belong to R1a and this Mongolian girl looks half Russian. Recent genetic evidence suggests that a group of ancient Russians called Yamna steppe herders from the Pontic-Caspian steppe historically migrated to Central Asia, leaving their genetic footprints on Central Asian populations.
Approximately 1000 males from 27 East Asian populations were typed with 61 Y-chromosomal markers, and we first describe the basic properties of this data set. The 45 binary markers identified 31 haplogroups (including paragroups) in the sample, while the 15 STRs defined 730 different haplotypes (Figure 1, Table 1; see also supplemental Table 1 at www.genetics.org/supplemental/). Population diversities ranged from 0.60 to 0.94 for binary markers and from 0.84 to 1.00 for STRs (Table 2). There was considerable variation in the distribution of lineages between populations, but this did not correspond to the major ethnic distinction in the area, which is between the Han Chinese (>80% of the combined populations of China, Mongolia, Korea, and Japan) and the other populations. AMOVA analysis showed that only 1.8 and 0.5% of variation lay between Han and non-Han populations using binary and STR markers, respectively, and neither of these values was significantly greater than zero. There were, however, major geographical differences. Figure 2 shows that, despite the overall predominance of haplogroup O (56%), specific haplogroups were concentrated in each geographical region: C and N in the north; P and J in the west; O2b in the east; and O1*, O2*, and O3d in the south. We therefore wished to identify the most important elements of the geographical pattern in an objective way.
Figure 1.— Phylogeny of Y-chromosomal haplogroups detected in this study.
We based the subsequent analyses on the STR data unless otherwise indicated because of the problems in interpreting data from preascertained binary markers. SAMOVA analysis (Dupanloup et al. 2002) identifies, for a prespecified number of groups of populations, the geographical groups that are most differentiated from one another. Application of this method to the East Asian Y-STR data set using two or three groups distinguished small numbers of unusual populations, a finding that is readily understood from the high frequencies of the “star cluster” (Zerjal et al. 2003) and “Manchu cluster” (Xue et al. 2005) lineages in some northern populations, and reflects extreme expansions of individual patrilines within historical times. The use of four groups provided the most informative subdivision, with a cluster of six southern populations distinguished in addition to some of the northern ones (Figure 3A). This pattern corresponds well to the north–south distinction seen with classical markers and shows that, in this respect, the Y-chromosomal variation is typical of that on other chromosomes. The division of the sample into more groups led to further subdivisions in the south (e.g., Figure 3B). Spatial autocorrelation analysis (Bertorelle and Barbujani 1995), based on the binary marker variation, produced correlograms that indicated significant clinal patterns or long-distance differentiation (not shown). The north–south haplogroup structure is therefore a continuum rather than a sharp bipartite division. To understand it further, we have explored the characteristics of the populations in more detail, concentrating on the 22 non-Han populations because of the spread of the Han during historical times (Wen et al. 2004).
Figure 2.— Geographical distributions of Y-chromosomal haplogroups. (A) Populations sampled. (B–F) Haplogroup frequencies: circle area is proportional to sample size and sector area to haplogroup frequency. (B–E) Haplogroups are sorted into those showing predominantly northern (B), western (C), southern (D), and eastern (E) distributions. (F) The overall frequency of the most common haplogroup, O.
Figure 3.— SAMOVA analysis illustrating the geographical divisions identified when four (A) or six (B) groups are specified.
A simple property of a population is the variation it contains, and this can be expressed in a number of ways. A widely used measure, diversity, is so high when 15 STRs are used that the differences between populations are small (Table 2) and difficult to interpret. Reducing the number of STRs to an arbitrary four or three (Table 2, supplemental Figure 1 at www.genetics.org/supplemental/) produces a wider range of diversity values, and these are notably higher in the north than in the south. An alternative measure of variation within a population, average squared distance (ASD), shows a similar pattern. BATWING analysis allows demographic parameters of the populations to be explored. Using a model where the population size remains constant for a period and then begins to expand exponentially, we estimated, for each population, posterior values of (1) the effective population size during the constant period, Nposterior; (2) the time at which growth began; (3) the rate of growth per generation, α; and (4) the time to the most recent common ancestor (TMRCA) of the population (Table 2). We again noted substantial variation with latitude. Median Nposterior was higher in the north, the expansion began earlier, the rate of growth was slower, and the TMRCA was longer. Although all of these variables correlated significantly with latitude when examined individually in regression analyses (Table 3), the highest was with expansion time (adjusted R2 = 0.68), compared with 0.40 for the next highest, ASD. Unsurprisingly, a stepwise multiple regression analysis identified expansion time as the best predictor of north–south distance, and only α increased this significantly to reach an adjusted R2-value of 0.75. Thus earlier expansion time in the north and, to a lesser extent, more rapid expansion in the south, account best for the observed north–south differences. We display the expansion times as a contour plot in Figure 4, where the consistent difference between north and south is apparent. Figure 4 suggests, however, that the highest correlation of population expansion may not be with distance due north–south, but with distance along an axis tilted slightly northwest–southeast, and further examination showed that a tilt of ∼10° in fact gave the highest R2-value (0.71 compared with 0.69).
Figure 4.— Contour plot showing the distribution of expansion times. Demographic expansion began earlier in the north than in the south.
The distribution of Y-chromosomal haplogroups in East Asia has been extensively documented (e.g., Jin and Su 2000; Karafet et al. 2001; Deng et al. 2004), but these observations have raised questions about the relationship of northern and southern populations that remain unanswered. Su et al. (1999) typed 19 binary markers, 12 of which were chosen because they were already known to be variable in East Asia, and found higher diversity in the south than in the north and that the northern lineages were a subset of the southern ones, leading them to suggest that the northern populations were derived from the south by northward migrations. In contrast, Karafet et al. (2001) used a larger set of 52 binary markers ascertained mainly because of their variation in worldwide populations and discovered higher diversity (mean pairwise differences) in the north and that the northern lineages were not a subset of the southern ones. They concluded that a contribution to the northern populations from Central Asia was likely. The use of preascertained binary markers introduces a bias into estimates of diversity, but STRs are essentially free of this bias because they are variable in all populations. In our samples, STR diversity and ASD measurements were higher in the north than in the south (Table 2), a finding that is not easily reconciled with a largely or exclusively southern origin for the northern populations. It has been suggested that some populations, such as Hui, Uygurs, and Mongolians, have recent admixture with Central Asia and so reliance on them may give a false impression (Shi et al. 2005), but our findings are common to most populations from the north (Table 2).
Figure 5.— Effect of artificial mixing of population data on estimated expansion time. Median values are plotted, together with their 95% confidence intervals.
Our most striking observation was the demographic contrast between north and south, which was explained largely by the variation in the start of population expansion (Tables 2 and 3; Figure 4). Despite the simplified demographic model and wide confidence intervals in the BATWING estimates (Table 2), the median values exhibit a simple and striking pattern: all of the northern estimates lie between 22 and 34 KYA, while all of the southern estimates are between 12 and 18 KYA. These suggest that the northern populations started to expand before the LGM (∼18–21 calendar KYA), while the southern populations started to expand after it. These time estimates are calibrated against historical events (Zhivotovsky et al. 2004) and so do not depend on the assumption of a particular male generation time, but nevertheless are uncertain, and so any interpretation based on them must be regarded with caution. Importantly, however, they are affected little by extensive admixture (Figure 5) and in such a case reflect the earlier expansion time. While extreme northern latitudes were inhospitable to early humans, Siberia has an extensive Upper Paleolithic archaeological record (Kuzmin and Orlova 1998) and a highly productive environment stretched across Asia. This showed an abundance of large animals and has been called the “Mammoth Steppe” (Guthrie 1990). Expansion times calculated in the same way for the Central Asian populations described by Zerjal et al. (2002), excluding those showing recent severe bottlenecks, lay between 24 (13–45) and 36 (16–74) KYA, like those of the northern populations from East Asia. We therefore propose that this cold but rich environment allowed the demographic expansion of populations who learned to exploit the profuse animal resources, and these people contributed in sufficient numbers to the ancestry of the northern populations we have tested to leave a signature in their paternal lineages. In contrast, this environment did not extend to the southern region, and the populations based there expanded only after the end of the LGM as the climate became warmer and more stable. The large-scale use of underground tubers is thought to have begun in the south as early as 15 KYA (Tong 2004), and it is notable that population expansion was subsequently more rapid there. The survival of this distinct demographic signature provides further evidence for the genetic differentiation between north and south and lack of extensive gene flow, leading to a genetic boundary seen initially in classical marker studies (Xiao et al. 2000).