|
Post by Admin on Feb 18, 2015 14:31:59 GMT
Fig. 1. Maximum-likelihood tree of 75 populations. A hypothetical most-recent common ancestor (MRCA) composed of ancestral alleles as inferred from the genotypes of one gorilla and 21 chimpanzees was used to root the tree. Branches with bootstrap values less than 50% were condensed.Population identification numbers (IDs), sample collection locations with latitudes and longitudes, ethnicities, language spoken, and size of population samples are shown in the table adjacent to each branch in the tree.Linguistic groups are indicated with colors as shown in the legend.Several genome-wide studies of human genetic diversity focusing primarily on broadcontinental relationships, or fine-scale structure in Europe, have been published recently (1–8).We have extended this approach to Southeast Asian (SEA) and East Asian (EA) populations by using the Affymetrix GeneChip Human Mapping50K Xba Array. Stringently quality-controlledgenotypes were obtained at 54,794 autosomal single-nucleotide polymorphisms (SNPs) in 1928individuals representing 73 Asian and two non-Asian HapMap populations (9). Apart from developing a general description of Asian population structure and its relation to geography, language,and demographic history, we concentrated on uncovering the geographic source(s) of EA and SEA populations.We first performed a Bayesian clustering procedure using the STRUCTURE algorithm (10)to examine the ancestry of each individual. Each person is posited to derive from an arbitrary num-ber of ancestral populations, denoted by K. We ran STRUCTURE from K = 2 to K = 14 using both the complete data set and SNP subsets to excludethose in strong linkage disequilibrium (Fig. 1 and figs. S1 to S13). At K = 2 and K = 3, all SEA and EA samples are united by predominant member-ship in a common cluster, with the other cluster(s)corresponding largely to Indo-European (IE) and African (AF) ancestries. At K = 4, a componentmost frequently found in Negrito populations that is also shared by all SEA populations emerges,suggesting a common SEA ancestry. Each value of K beyond 4 introduces a new component that tends to be associated with a group of populations united by membership in a linguistic family,by geographic proximity, by a known history of admixture, or, especially at higher Ks, by membership in a small population isolate. The results obtained using frappe (11), a maximum-likelihood–based clustering analysis, showed a general concordance with those of STRUCTURE (figs. S14to S26). These analyses show that most individuals within a population share very similar ancestry estimates at all Ks, an observation that is consistent also with a phylogeny relating individuals (fig. S27) based on an allele-sharing distance (12). Therefore, we proceeded to evaluate the relationships among populations. A maximum-likelihood tree of populations, based on 42,793SNPs whose ancestral states were known (Fig.1), showed that all the SEA and EA populations make up a monophyletic clade that is supportedby 100% of bootstrap replicates. This pattern remained even after data from 51 additional populations and 19,934 commonly typed SNPs from a recent study were integrated into the tree (fig.S28). These observations suggest that SEA and EA populations share a common origin.STRUCTURE/frappe and principal components analyses (PCA) (13) (Figs. 1 and 2 and figs.S1 to S26) identify as many as 10 main population components. Each component corresponds largely to one of the five major linguistic groups(Altaic, Sino-Tibetan/Tai-Kadai, Hmong-Mien,Austro-Asiatic, and Austronesian), three ethnic categories (Philippine Negritos, Malaysian Negritos,and East Indonesians/Melanesians) and two small population isolates (the Bidayuh of Borneo andthe hunter-gatherer Mlabri population of centraland northern Thailand). Fig. 2. Analysis of the first two PCs. (A) 1928 individuals representing all 75populations. (B) 1868 individuals representing 74 populations (excluding YRI). (C) 1471 individuals representing 58 populations (excluding all Indians,CN-UG, TH-MA, AX-ME, and Negritos from Malaysia). (D) 1235 individuals representing 44 populations (excluding Philippine Negritos, PI-MA, and East Indonesians)The STRUCTURE results (Fig. 1 and figs. S1 to S13), population phylogenies (Fig. 1 and figs. S27 and S28), and PCA results (Fig. 2) all show that populations from the same linguistic group tend to cluster together. A Mantel test confirms the correlation between linguistic and genetic affinities (R2 = 0.253;P< 0.0001with 10,000 permutations), even after controlling for geography (partial correlation = 0.136; P <0.005 with 10,000 permutations). Nevertheless, we identified eight population outliers whose linguistic and genetic affinities are inconsistent [Affymetrix-Melanesian (AX-ME), Malaysia-Jehai (MY-JH9(Negrito), Malaysia-Kensiu (MY-KS) (Negrito),Thailand-Mon (TH-MO), Thailand-Karen (TH-KA), China-Jinuo (CN-JN), India-Spiti (IN-TB),and China-Uyghur (CN-UG); see table S3]. These linguistic outliers tend to cluster with their geo-graphic neighbors or [especially evident in the principal component (PC) plots of Fig. 2] occupy an intermediate position between their geographic neighbors and the more-distant members of their linguistic group. These patterns are consistent either with substantial recent admixture among the populations (14–16), a history of language replacement(17), or uncertainties in the linguistic classifications themselves (for example, the controversial Altaic family, which groups Korean and Japanese with Uyghur).Considerable gene flow among Asian populations was observed among subpopulations in these clusters, including those groups believed to practice endogamy based on linguistic, cultural,and ethnic information. In fact, most populations studied, even at lower Ks, show evidence of admixture in the STRUCTURE analyses. For example, the Han Chinese have grown to be-come the largest ethnic group today in a demographic expansion that has occurred mostly within historical times. STRUCTURE reveals that the six Han Chinese population samples in our study show varying degrees of admixture(Fig. 1 and figs. S1 to S26) between a northern Altaic cluster and a Sino-Tibetan/Tai-Kadaicluster, which most frequently appears in the ethnic groups sampled from southern China and northern Thailand. Finally, most of the Indian populations showed evidence of shared ancestry with European populations, which is consistent with the recent observations (18) and our understanding of the expansion of Indo-European–speaking populations (Fig. 1 and figs.S1 to S26). The geographic source(s) contributing to EA populations have long been debated. One hypothesis suggests that all SEA and EA populations derive primarily from a single initial migration,which entered the continent along a southern,largely coastal route (19, 20). Another hypothesis argues for at least two independent migrations into East Asia, first along a southern route, fol-lowed later by a series of migrations along a more northern route that served to bridge European and EA populations, but with little contribution to populations in Southeast Asia (20). The topology of a maximum-likelihood tree (Fig. 1 and fig. S28) displays a largely south-to-north ordering of the populations, and a plot of the first two PCs(Fig. 2) similarly orients most populations according to their geographic coordinates. The average value of the first PC is highly correlated with the latitude at which the populations were sampled(R2 = 0.79, P < 0.0001). Such a pattern could result simply from isolation-by-distance (IBD), as suggested by Ding et al. (21), although a recent study failed to detect IBD in East Asia with data from the Human Genome Diversity Project (22).In an effort to distinguish between long-term historical divergence and the effects of IBD, we applied partial and multiple Mantel tests to the data (23) [see supporting online material (SOM)text for details]. The primary approach was to ascertain the differential correlation between genetic distance, geographical distance, and a group indicator matrix as an indication of prehistoric population divergence. The partial correlation co-efficient of genetic and geographic distances was 0.228 (P < 0.0006), after controlling for the group indicator matrix (inferred from STRUCTURE/frappe analyses), whereas the partial correlation of the genetic and group indicator matrices was 0.403 (P < 0.0001) after controlling for geography. The superior association between genetic distance and the group indicator matrix as measured by the correlation coefficients suggests that prehistorical population divergence is the favored model over IBD in explaining the data (24). This conclusion is supported by simulation studies that also suggest that the observed patterns cannot be explained by simple IBD effects alone (see SOM text for details). Fig. 3. Analysis of haplotype diversity, haplotype sharing, and population phylogeny. (A) Haplotype diversity versus latitudes. Haplotypes were estimated from combined data, and diversity was measured by heterozygosity of haplotypes. HSa, b, c, and d and the corresponding colors show the percentages of EA group haplotypes in each class: HSa, found in CSA only; HSb, found in neither CSA nor SEA; HSc, found in both CSA and SEA; HSd, found in SEA only. Latitudes(y axis) for groups were obtained from the center of sample collection locations.Circled numbers are as follows: 1, Indonesian; 2, Malay; 3, Philippine; 4, Thai; 5,Southern Chinese minorities; 6, Southern Han Chinese; 7, Japanese and Korean;8, Northern Han Chinese; 9, Northern Chinese minorities; and 10, Yakut. Haplotype heterozygosity of each group was estimated from 100-kb bins and taking together all haplotypes within each group. R2 for the regression line is 0.91 (P <0.0001). (B) Haplotype sharing analysis for EA populations and groups. YKT,Yakut; N-CM, Northern Chinese minorities; N-HAN, Northern Han Chinese;JP-KR, Japanese and Korean; S-HAN, Southern Han Chinese; S-CM, Southern Chinese minorities; EA, East Asian. (C) Phylogeny of group private haplotypes.EA private haplotypes: haplotypes found only in EA samples; SEA private haplotypes: haplotypes found only in SEA samples; CSA private haplotypes:haplotypes found only in CSA samples; Shared haplotypes: haplotypes found in all EA, SEA, and CSA samples; African haplotypes were used as outgroup. (D)Maximum-likelihood tree of 29 populations. The tree is based on data from19,934 SNPs. Bootstrap values were based on 100 replicates. Only values on splitting of African and non-African, European and Oceanian and Asian, and Oceanian and Asian are shown.To further refine the analysis, we looked to haplotype organization to limit the effect of fluctuations in single-nucleotide determinations and to increase the resolution around genetic diversity.The IBD model predicts a correlation of genetic distance with geographical distance but not genetic diversity and geographic distance (24). By contrast, we found (Fig. 3A) that haplotype diversity is strongly correlated with latitude (R2 =0.91, P < 0.0001), with diversity decreasing from south to north, which is consistent with a loss of diversity as populations moved to higher latitudes. In estimating the contribution of SEA and Central-South Asian (CSA) haplotypes to the EA gene pool by haplotype sharing analyses (16), we found that more than 90% of haplotypes in EA populations could be found in SEA and CSA populations, of which about 50% were found in SEA and EA only and 5% found in CSA only (Fig. 3B,see also SOM text). Phylogenetic analysis of private haplotypes indicates greater similarity be-tween EA and SEA populations relative to EA and CSA populations (Fig. 3C). These observations suggest that the geographic source(s) contributing to EA populations were mainly from SEA populations, with rather minor contributions from CSA and that this clinal structure of EA populations arose from prehistoric population divergence rather than IBD or gene flow from CSA populations.On the basis of increased cultural, linguistic,and genetic diversity, the origins of SEA populations are thought to be more complex than the origins of those to their north. Notably, the Negritos of the Philippines and Malaysia differ from neighboring populations in aspects of their physical appearance, prompting intense speculation about models of human settlement in Southeast Asia. The two-wave hypothesis, which suggests that ancestral Negrito populations settled in South-east Asia, Australia, and Oceania before a more northerly migration originating in or near the Middle East, and spreading both toward Europe and Northeast Asia via Central Asia (25), has been sup-ported by phylogenetic trees constructed from data on a limited number of protein markers (24, 25).The topology of our population trees, both with and without the data from additional European and Asian populations discussed in (1), is in-consistent with regard to this genetic similarity of European and EA populations (Figs. 1 and 3D). Instead, on the basis of variation at a large number of independent SNPs, we observed that there is substantial genetic proximity of SEA and EA populations (fig. S28). An identical pattern is seen in the population tree of Li et al. (1) based on all of their 642,690 SNPs. Our forward-time simulation results under extreme ascertainment scenarios (SOM text) show that the observed phylogeny is not the result of ascertainment bias.Simulation studies also suggest that substantial levels of migration between populations after their initial separation are unlikely to distort the topology of the phylogeny (SOM text). HUGO Pan-Asian SNP Consortium. "Mapping human genetic diversity in Asia." Science 326.5959 (2009): 1541-1545.
|
|
|
Post by Admin on Mar 3, 2015 13:24:29 GMT
More than 800 million men living today are descended from just eleven men, including the ruthless Mongolian leader Genghis Khan, according to new research. Geneticists have been able to find eleven distinctive sequences in Y-chromosomes - the chunk of DNA that is only carried by men - that are persistent in modern populations in Asia. By systematically analysing the DNA of more than 5,000 men, they have been able to trace these male lineages to their approximate 'founding fathers'. They found that along with Khan, who is reputed to have sired hundreds of children as his hoards cut a swathe across much of Asia, they traced ten other lineages. These are thought to originate from the Middle East to Southeast Asia between 2100BC and 1100AD. They found that 37.8 per cent of the 5,000 men they tested belonged to one of these eleven lineages. If this is reflected in the entire Asian population, then it could mean around 830 million men living in Asia currently owe their Y-chromosomes to one of these eleven men. Among them is a lineage that has previously been attributed to a Chinese ruler called Giocangga, who died in 1583 and whose grandson founded the Qing Dynasty that ruled China between 1644 and 1912. Giocangga is thought to have had many children with his wives and concubines and is the direct male ancestor of more than 1.5 million men. The researchers also found that another of the lineages appears to have population clusters that are concentrated along the Silk Road trading route and date back to around 850AD. This suggests they may have their origins among the powerful rulers who dominated the steppes where the route passed - the Khitan, Tangut Xia, Juchin, Kara-Khitan and Mongol empires. The researchers suggest that Abaoji, Emperor Taizu of Liao and the Great Khan of the Khitans, who died in 926AD in the Khitan area of China, is a possible candidate for the father of this lineage. Jobling’s team made a systematic search for genetic founders by analysing the Y chromosomes of more than 5,000 men from 127 populations spanning Asia; they focused on that region because lots of data were available and there was already evidence of such lineages. The team identified 11 Y-chromosome sequences that were each shared by more than 20 of the 5,321 genomes. The researchers used DNA differences in the shared sequences, which accumulate over time from random mutations, to determine approximately when the founder of the lineage lived. They tracked back the geographical origins of the lineages by assuming that the founding men had lived in the regions where their genotypes were most prevalent and diverse. Genghis Khan’s paternal lineage again stood out, as did Giocangga’s, Jobling’s team reports in the European Journal of Human Genetics1. The other nine lineages originated throughout Asia, from the Middle East to southeast Asia, dating to between 2100 bc and ad 700. Jobling warns that these dates come with huge margins of error, but he notes that the estimates for the lineages attributed to Khan and Giocangga are very close to those of past studies. The founders who lived between 2100 bc and 300 bc existed in both sedentary agricultural societies and nomadic cultures in the Middle East, India, southeast Asia and central Asia. Their dates coincide with the emergence of hierarchical, authoritarian societies in Asia during the Bronze Age, such as the Babylonians. Three lineages dating to more recent times were all linked to nomadic groups in northeast China and Mongolia. These included the lineages linked to Genghis Khan and Giocangga, plus a third line dating to around ad 850. All three lineages seem to have expanded westwards, possibly along the Silk Road trade route. Historians have documented a series of polities based in inner Asia between 200 bc and the eighteenth century, such as the Qing Dynasty. Jobling says that these civilizations could have fostered dominant male lineages after the sons of a fecund founder decamped to satellite outposts, where they, in turn, fathered powerful descendants. The researchers identify several candidates for the lineage dating to ad 850, but say that more research is needed. Recovering DNA from the candidate or or a long-dead descendant would be the ultimate proof. Nature doi:10.1038/nature.2015.16767
|
|
|
Post by Admin on Mar 8, 2015 13:15:26 GMT
We generated genome-wide data from 69 Europeans who lived between 8,000–3,000 years ago by enriching ancient DNA libraries for a target set of almost 400,000 polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies1, 2, 3, 4, 5, 6, 7, 8 and to obtain new insights about the past. We show that the populations of Western and Far Eastern Europe followed opposite trajectories between 8,000–5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000–7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000-year-old Siberian6. By ~6,000–5,000 years ago, farmers throughout much of Europe had more hunter-gatherer ancestry than their predecessors, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~75% of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for a steppe origin9 of at least some of the Indo-European languages of Europe. The first migration (early Neolithic) is already uncontroversial, but the paper includes data from Spanish early farmers that are also Sardinian- and LBK-like. The "Sardinian" Iceman was no fluke. It is now proven that not only the LBK but also the Spanish Neolithic came from the same expansion of Mediterranean populations which survives in Sardinia. The authors write: Principal components analysis (PCA) of all ancient individuals along with 777 present-day West Eurasians4 (Fig. 2a, SI5) replicates the positioning of present-day Europeans between the Near East and European hunter-gatherers4,20, and the clustering of early farmers from across Europe with present day Sardinians3,4,27, suggesting that farming expansions across the Mediterranean to Spain and via the Danubian route to Hungary and Germany descended from a common stock. The second migration went into eastern Europe: The Yamnaya differ from the EHG by sharing fewer alleles with MA1 (|Z|=6.7) suggesting a dilution of ANE ancestry between 5,000-3,000 BCE on the European steppe. This was likely due to admixture of EHG with a population related to present-day Near Easterners, as the most negative f3-statistic in the Yamnaya (giving unambiguous evidence of admixture) is observed when we model them as a mixture of EHG and present-day Near Eastern populations like Armenians (Z = -6.3; SI7). The EHG (Eastern European Hunter-Gatherers) are likely Proto-Europeoid foragers and the Yamnaya (a Bronze Age Kurgan culture) were a mixture of the EHG and something akin to Armenians.The "attraction" of later groups to the Near East is clear in the PCA: hunter-gatherers on the left side, the Near East (as grey dots) on the right side, and Neolithic/Bronze Age/modern Europeans in the middle. The second migration may very well be related to the Uruk expansion and the presence of gracile Mediterranoids and robust Proto-Europeoids in the Yamna: The Yamna population generally belongs to the European race. It was tall (175.5cm), dolichocephalic, with broad faces of medium height. Among them there were, however, more robust elements with high and wide faces of the proto-Europoid type, and also more gracile individuals with narrow and high faces, probably reflecting contacts with the East Mediterranean type (Kurts 1984: 90). Middle Neolithic Europeans from Germany, Spain, Hungary, and Sweden from the period ~4,000-3,000 BCE are intermediate between the earlier farmers and the WHG, suggesting an increase of WHG ancestry throughout much of Europe. We estimate that these two elements each contributed about half the ancestry each of the Yamnaya (SI6, SI9), explaining why the population turnover inferred using Yamnaya as a source is about twice as high compared to the undiluted EHG. The estimate of Yamnaya related ancestry in the Corded Ware is consistent when using either present populations or ancient Europeans as outgroups (SI9, SI10), and is 73.1 ± 2.2% when both sets are combined (SI10). [...] The magnitude of the population turnover that occurred becomes even more evident if one considers the fact that the steppe migrants may well have mixed with eastern European agriculturalists on their way to central Europe. Thus, we cannot exclude a scenario in which the Corded Ware arriving in today’s Germany had no ancestry at all from local populations. But there is another component present in modern Europe, the West_Asian which is conspicuous in its absence in all the ancient samples so far. This component reaches its highest occurrence in the highlands of West Asia, from Anatolia and the Caucasus all the way to the Indian subcontinent. [...] Nonetheless, some of the legacy of the earliest Indo-European speakers does appear to persist down to the present day in the genomes of their linguistic descendants, and I predict that when we sample later (post 5-4kya) individuals we will finally find the West_Asian piece that is missing from the European puzzle. This pattern is also seen in ADMIXTURE analysis (Fig. 2b, SI6), which implies that the Yamnaya have ancestry from populations related to the Caucasus and South Asia that is largely absent in 38 Early or Middle Neolithic farmers but present in all 25 Late Neolithic or Bronze Age individuals. This ancestry appears in Central Europe for the first time in our series with the Corded Ware around 2,500 BCE (SI6, Fig. 2b, Extended Data Fig. 1). These results can be explained if the new genetic material that arrived in Germany was a composite of two elements: EHG and a type of Near Eastern ancestry different from that which was introduced by early farmers (also suggested by PCA and ADMIXTURE; Fig. 2, SI5, SI6). We shall see, in our survey of prehistoric European racial movements, 8 that the Danubian agriculturalists of the Early Neolithic brought a food-producing economy into central Europe from the East. They perpetuated in the new European setting a physical type which was later supplanted in their original home. Several centuries later the Corded people, in the same way, came from southern Russia but there we first find them intermingled with other peoples, and the cul-tural factors which we think of as distinctively Corded are included in a larger cultural equipment. [...] On the basis of the physical evidence as well, it is likely that the Corded people came from somewhere north or east of the Black Sea. The fully Neolithic crania from southern Russia which we have just studied include such a type, also seen in the midst of Sergi's Kurgan aggregation. Until better evidence is produced from elsewhere, we are entitled to consider southern Russia the most likely way station from which the Corded people moved westward. Haak, Wolfgang, et al. "Massive migration from the steppe was a source for Indo-European languages in Europe." Nature (2015).
|
|
|
Post by Admin on Mar 20, 2015 14:06:49 GMT
The team, led by Dr Peter Donnelly of the Wellcome Trust Centre for Human Genetics in Oxford, UK, analyzed the DNA of people from rural areas of the UK, whose four grandparents were all born within 80 km of each other. Because a quarter of our genome comes from each of our grandparents, Dr Donnelly and his colleagues were effectively sampling DNA from these ancestors, allowing a snapshot of UK genetics in the late 19th century. The scientists found that the samples could be grouped into 17 clusters, which coincided strongly with geographical areas. According to the scientists, “the majority of eastern, central and southern England is made up of a single, relatively homogeneous, genetic group with a significant DNA contribution from Anglo-Saxon migrations (10-40 percent of total ancestry). This settles a historical controversy in showing that the Anglo-Saxons intermarried with, rather than replaced, the existing populations.” “There are separate genetic groups in Cornwall and Devon, with a division almost exactly along the modern county boundary.” “The population in Orkney emerged as the most genetically distinct, with 25 percent of DNA coming from Norwegian ancestors. This shows clearly that the Norse Viking invasion (9th century) did not simply replace the indigenous Orkney population.” There is no obvious genetic signature of the Danish Vikings, who controlled large parts of England from the 9th century. “The Welsh appear more similar to the earliest settlers of Britain after the last ice age than do other people in the UK. The scientists said there was not a single Celtic genetic group. “In fact the Celtic parts of the UK (Scotland, Northern Ireland, Wales and Cornwall) are among the most different from each other genetically. For example, the Cornish are much more similar genetically to other English groups than they are to the Welsh or the Scots.” “There is genetic evidence of the effect of the Landsker line – the boundary between English-speaking people in south-west Pembrokeshire and the Welsh speakers in the rest of Wales, which persisted for almost a millennium.” Today, few Britons have ancestors from just one local region of the United Kingdom, so it is hard to identify patterns of genetic variation specific to any one place. But Donnelly and his team found 2,039 Britons of European ancestry who lived in rural areas and knew that their four grandparents were all born within 80 kilometres of each other. Since these volunteers’ DNA was a mosaic of their grandparents’, who themselves were to known be strongly linked to one British region in the late nineteenth century, Donnelly hoped to find genetic variation that clustered neatly with their grandparents' geographic location. So it proved: a statistical model lumped participants into 17 groups based only on their DNA, and these groupings matched geography. People across central and southern England fell into the largest group, but many groupings were more isolated, such as the split between Devonians and Cornish in Britain’s southwest. People who trace their ancestry to the Orkney Islands, off the northeast coast of Scotland, fell into three distinct categories. They are likely so differentiated because the islands made it hard for different populations to mingle. As well as geographic barriers like these, the patchwork was formed by migrations into and around Britain, Donnelly says. The team analysed the genomes of 6,209 people from continental Europe to understand their ancestors’ contributions to Britons’ ancestry. This confirmed the flow of Anglo-Saxons from present-day Germany into Britain after the departure of the Romans in 410 ad. They interbred with local residents instead of replacing them wholly, as some historians and archaeologists have suggested. Danish Vikings, who occupied Britain between the 700s and 1100s ad, by contrast, left little signature in most Britons’ genomes. Nature doi:10.1038/nature.2015.17136
|
|
|
Post by Admin on Apr 19, 2015 15:43:45 GMT
Phonemic data were compiled by M.R. (the Ruhlen database); for 2,082 languages with complete phoneme inventories and referenced sources in this database, we annotated each language with geographic coordinates (Fig. 1A) and the number of speakers reported (24). We also analyzed PHOIBLE (PHOnetics Information Base and Lexicon) (25), a linguistic database with phoneme inventories for 968 languages. For 139 globally distributed populations in the Ruhlen database (114 in PHOIBLE), we matched each population’s genetic data to the phoneme inventory of its native language (20), producing novel “phoneme–genome datasets” that allow joint analysis of genes and languages. Fig. 1. Procrustes-transformed PCs for all phonemes and regional axes of phonemic and genetic differentiation. (A) Locations of 2,082 languages in the Ruhlen database. Phoneme inventory size of each language is indicated by the color bar. We performed Procrustes analyses to compare the first two PCs of phonemic data (B and C) and genetic data (D) to the geographic locations of languages/populations (P < 10−5 for all three comparisons after 100,000 permutations). The mean Procrustes-transformed PC values (B) for phonemes in the Ruhlen database (t0 = 0.57), (C) for phonemes in PHOIBLE (t0 = 0.52), and (D) for allele frequencies (t0 = 0.69) are displayed in each geographic region. Circle size corresponds to number of languages (B and C) or populations (D). (E) For the Ruhlen phoneme–genome dataset, pairwise geographic distance matrices were projected along different axes (calculated at 1° intervals); within each region, the rotated axis of geographic distance that was most strongly associated (greatest Mantel r) with phonemic distance (black arrows) and genetic distance (gray dashed arrows) is shown. Thinner arrows (Europe, East Asia, South America) indicate nonsignificant associations. Black dots indicate population locations for the Ruhlen phoneme–genome dataset. With the exception of North America, axes of phonemic differentiation and genetic differentiation are similar in most regions (North America: 78° difference; other regions: mean difference 16°). To compare the signatures of human demographic history on genetic variation and phoneme inventories, we used Procrustes analyses to compare principal components (PCs) for both data types with sample geographic locations and determined whether phonemic and genetic distance are more correlated than expected from geographic distance alone. We also developed a new method for identifying regional axes of linguistic and genetic differentiation and tested whether the origin of the human expansion out of Africa can be detected from the geographic distribution of the numbers of phonemes in languages (phoneme inventory sizes). Conflicting predictions exist for the effects of geographic isolation and population contact on language evolution (e.g., refs. 26⇓⇓–29); we tested these by comparing phoneme inventories according to language density at varying radii. We also quantified the extent to which phoneme evolution can be modeled along genetic, geographic, and cognate-based phylogenies. With these joint analyses, we tested whether phonemes and alleles carry signatures of ancient population divergence and recent human migrations, and we identified demographic processes that have different effects on phonemes and alleles. Over a series of radial distances, we assessed the effect of geographic isolation on phonemes in each language by comparing the phoneme inventories of each language and its neighbors. For languages that have fewer than or equal to the median number of neighboring languages within a radius of k kilometers (“fewer neighbors”), we observed a small but significant increase in phoneme inventory size as well as significantly higher phonemic distance between geographically close languages for many values of k (Fig. 2); this trend was also observed within Africa, Central/South Asia, East Asia, and Oceania (SI Appendix, Fig. S4). In areas with greater language density, phonemes were on average more similar between languages than in areas with fewer neighbors (SI Appendix, Fig. S5). In addition, languages with fewer neighbors had significantly higher variance in both phoneme inventory size and phonemic distance (Ansari–Bradley P < 2 × 10−3); this trend was also significant within Africa, Central/South Asia, East Asia, North America, and Oceania (SI Appendix, Fig. S6). Fig. 2. The effect of geographic isolation on phonemes. Languages with fewer neighbors (less than or equal to median number of neighbors) had significantly more phonemes (Wilcoxon rank-sum test) than languages with more neighbors for all tested radii in the Ruhlen database (black line) and for radii < 175 km in PHOIBLE (red line). Examples are shown as inset boxplots: within a radius of 75 km for languages in PHOIBLE, the median number of neighbors was three languages; we observed slightly but significantly more phonemes in languages with zero to three neighbors than in languages with four or more neighbors (red boxplot inset). Similarly, within a radius of 125 km for languages in the Ruhlen database, there was a small but significant increase in the number of phonemes for languages with the median number of neighbors (8) or fewer (black boxplot inset). The analyzed languages did not evolve independently: neighboring languages are often in the same family and related languages might share more phonemes. To address this, we compared phonemic distance with geographic distance to each language, separately for languages in the same language family and in different families. Geographic distance was significantly positively correlated with phonemic distance; this was true both for language pairs within the same family and for language pairs in different families within the same geographic area. Associations significantly different from zero (P < 10−3) were positive for 99% of within-family comparisons and 87% of between-family comparisons. There was no significant difference in this relationship for languages in the same and different language families (Wilcoxon P = 0.22) (Fig. 3C). When two languages were geographically near, they tended to share more phonemes even if they were not closely related, suggesting a relationship between phonemes and geography both within and between language families. Fig. 3. Best-fit linear regressions of phoneme inventory size on geographic distance. For both databases, the best-fit geographic center was located in northern Europe, roughly equidistant from Oceania and South America, grouping two regions with small phoneme inventories and producing a significantly negative slope. This finding suggests that phonemes do not show a strong signature of ancient population divergence. (A) Regression from the best-fit of 4,210 geographic centers on the Earth for languages in the Ruhlen database (see SI Appendix, Fig. S7 for PHOIBLE). (B) Using the median number of phonemes within each family, the best-fit geographic center for language families in PHOIBLE remained in northern Europe (see SI Appendix, Fig. S7 for Ruhlen). Geographic regions are indicated by color as in A, but y-axis scales differ. (C) Phonemic distance increases with geographic distance, even for languages in different families. For significant correlations between phonemic distance and geographic distance, the slope of the regression line for both within-family and between-family comparisons (y axis) was positive the vast majority of the time, and the distributions of these slopes were not significantly different from one another (Wilcoxon P = 0.22). The Signature of Ancient Population Divergence on Genes and Languages. Global genetic and phonemic patterns were not universally concordant: the most genetically polymorphic populations [top fifth percentile for number of microsatellite alleles observed (20)] are all in Africa, whereas the largest phoneme inventories in the Ruhlen database (top 5% of 2,082 languages, corresponding to at least 43 phonemes) (SI Appendix, Table S4) were globally distributed, predominantly in Africa (41 languages), Asia (32 languages), and North America (18 languages). Similarly, in PHOIBLE the languages with the most phonemes (top 5% of 968 languages, corresponding to at least 54 phonemes), were mainly in Africa (29 languages), Asia (12 languages), and North America (7 languages). These distributions suggest that population divergence across large distances might have affected phonemic and genotypic variation differently. Ancient population divergence is evident in human genetic diversity, which decreases with distance from southern Africa, a signature of the serial founder effect (13, 39, 40). Parallel patterns of decreasing diversity out of Africa have been reported for the partially vertically transmitted human pathogen Helicobacter pylori (41) and in human morphometric data (42). Inference of the human expansion out of Africa has also been attempted using categorical phoneme inventories (15), although phonemes are not necessarily lost after a population bottleneck. The conclusions from Atkinson (15) that language expansion followed a serial founder effect out of Africa and that phoneme inventory size was significantly correlated with current speaker population size (as in ref. 43) have both generated much debate (e.g., refs. 16⇓⇓–19, 25, 28, and 44⇓–46). Using both databases of phoneme inventories, we tested whether ancient human population divergence out of Africa left a similar signature on phonemes to that on genes. Fig. 4. Estimating ancestral phoneme states with cognate-based, geographic, and genetic trees. (A) Phylogeny of Indo-European languages (47) with presence of the phoneme /ʈ/ indicated by gray circles at each tip. Based on the tree topology and branch lengths, the probability of phoneme presence at interior nodes was predicted by ancestral character estimation (63). The amount of gray in the bar at each node represents the probability of phoneme presence, with white representing absence. The green rectangle highlights the low probability (2.84 × 10−3) of the presence of phoneme /ʈ/ in the ancestor to Romance languages, as shown by the lack of gray at that node. The orange rectangle highlights the probability of /ʈ/ presence in the ancestor to Indo-Aryan languages (~1). (B) Phylogeny of Indo-European populations constructed with genetic data from ref. 20. (C) Neighbor-joining tree of geographic distances between Indo-European-speaking populations. As in A, the presence of /ʈ/ in the language spoken by a given population is indicated in B and C by gray circles, and the probability of this phoneme’s presence at interior nodes (predicted by ancestral character estimation) is shown by the amount of gray at each node. For all three trees, the phoneme /ʈ/ was estimated to be likely absent in the language ancestral to the Romance languages (indicated by a mostly white bar inside each green rectangle) and likely present in the language ancestral to the Indo-Aryan languages (orange rectangle). (D) Examples of phonemes in the Ruhlen database were grouped by their relative rate of change from high (red) to low (blue) as predicted by the ancestor character estimation algorithm with all three trees. Predictions of relative rates of phoneme change were consistent among all pairs of the three trees (Spearman’s ρ ≥ 0.73, P ≤ 4.9 × 10−15). To compare the Ruhlen database and PHOIBLE with previous studies (15⇓⇓–18, 25), we regressed phoneme inventory size on geographic distance from 4,210 geographic centers on Earth (2, 13) and tested for a linear decrease in number of phonemes with distance to each center. For both databases, the geographic center with the most support for this model (lowest Akaike Information Criterion, AIC) was in northern Europe (Fig. 3) (Ruhlen 67.6684°, 36.2°; PHOIBLE 77.1614°, 16.4°); the distance between these centers is 1233.5 km. A decrease in number of phonemes with distance from Eurasia has been observed before (16). Although our analysis identifies a Eurasian center as the best-fit origin, we do not claim that a serial founder effect is an appropriate model for language expansion: phoneme inventory size is a coarse summary statistic, and phoneme loss does not necessarily occur with reduced population size or geographic isolation. Rather, the identified location is roughly equidistant from most languages in Oceania and South America, effectively grouping these regions of generally small phoneme inventory size to produce a significantly negative slope. Furthermore, the 2,082 points in the regression are not independent: many represent closely related languages (Fig. 3A). To reduce this dependence, we repeated the regression analysis using the mean or median values for the independent and dependent variables within each language family (Fig. 3B). As with individual languages, the best-fit origin was found in Northern Europe for the within-family mean and median values for both the Ruhlen database and PHOIBLE (SI Appendix, Fig. S7 and Table S5). For an Indo-European linguistic tree (47), a genetic tree of Indo-European-speaking populations, and a neighbor-joining tree of the geographic distances between language locations, we estimated the probability of phoneme presence at two internal nodes. Fig. 4 A–C illustrates the results of ancestral character estimation for an example phoneme, /ʈ/. We then compared these ancestral character estimates to the phoneme inventories of well-studied ancient languages for which primary sources exist: we used Vulgar Latin phonemes to approximate the phoneme inventory ancestral to modern Romance languages (48, 49) and Vedic Sanskrit phonemes to approximate the phoneme inventory ancestral to modern Indo-Aryan languages (50). For phoneme inventories in both databases, the cognate-based phylogeny (47), a geographic tree, and a genetic phylogeny gave similar predictions of the phoneme inventories of Vulgar Latin and Vedic Sanskrit (Table 1). The prediction of phoneme presence/absence with the ancestral character estimation algorithm was consistent with published sources for 67–88% of phonemes. Of the phonemes in published inventories that were accurately predicted by ancestral character estimation, most (53–94%) were predicted by multiple trees (SI Appendix, Fig. S8). In addition, each tree gave similar estimates for relative rates of phoneme change (Fig. 4D). Differences among populations in both phonemes and allele frequencies were strongly correlated with geographic distance. Furthermore, phonemes showed an association with geographic distance regardless of language classification but did not show the strong signatures of ancient population divergence found in genetic data. This suggests that phoneme inventories are affected by recent population processes and thus carry little information about the distant past (e.g., ref. 23); in contrast to genes, phoneme inventories in our analyses did not follow the predictions of a serial founder effect out of Africa. We also pinpoint where differences between genes and languages occur, both geographically and by characteristics of the surrounding populations. Our findings suggest that geographic isolation has different effects on genes and phonemes. Languages with fewer neighboring languages were more phonemically different from their neighbors than those with more neighbors, and geographically isolated populations may gain phonemes while losing genetic variation. In addition, ancestral phoneme inventories estimated along genetically, geographically, and lexically determined phylogenies produced similar results (Table 1). Creanza, Nicole, et al. " A comparison of worldwide phonemic and genetic variation in human populations." Proceedings of the National Academy of Sciences (2015): 201424033.
|
|