Post by Admin on Feb 8, 2014 23:20:13 GMT
Consensus Neighbor-Joining Tree of Populations The thickest edges have at least 95% bootstrap support, and the edges of intermediate thickness have at least 75% support. If all of the groups subtended by an edge have majority membership in the same cluster in Figure 2A (or only plurality membership in the cases of Hazara , Makrani, and Uygur), the edge is drawn in the same color as was used for the cluster.
A study (Rosenberg et al. 2006) showed that the Kalash people are a distinct or aboriginal population with only minor contributions from outside peoples and the Kalash formed one cluster in a cluster analysts. The cluster tree below shows that modern Europeans have descended from an ancestral group closely associated with the Kalash. Therefore, it can be reasonably concluded that the A111T mutation originally occurred in an ancestral group of the Kalash population and it was subsequently passed on to other Middle Eastern populations such as the Druze and Palestinians and then to various modern European populations located at the end of the blue-coloured Kalash branch of the cluster tree (e.g. Italian, French), which is consistent with the aforementioned migration patterns. Moreover, another genetic study (Firasat et al. 2007) analysed haplogroup frequencies among Kalash individuals and it found that there was a mixture of west and east Eurasian haplogroups. Haplogroup L (25%) and H (20.5%) have originated from prehistoric South Asia, while R1a (18.2%), R* (6.8%), R1* (6.8%), G (18.2%) and J2 (9.1%) can be commonly found in Europe and the Middle East. The Kalash people hold a special place in human evolution as the first group of people of Asian descent with the A111T mutation that gave rise to light skinned modern Europeans and they are the "lost white tribe" in Afghanistan from whom all modern European populations originated.
A Kalash girl in Afghanistan
To estimate the age of the A111T mutation, we used a molecular clock approach. We first determined the rate of mutation in the combined C and D regions from number of differences between human and chimpanzee reference sequences. In this alignment, nt changes within 4 nt of gaps were excluded to remove potential biases caused by misalignment. For calibration, we used 6 million years, the midpoint of the range of estimates (5−7 million years) for the divergence time between human and chimpanzee as identified by Kumar et al. (2005) and assumed equal mutation rates (per year) in the human and chimpanzee lineages. We then counted the number of single-nucleotide differences from the modal haplotype for each C11-D4−containing chromosome in the 1000 Genomes dataset by using all reported variants. Each chromosome provides an imprecise estimate of the time since the origin of the haplotype; values were averaged over individual populations, or the entire sample. Because accumulation of mutations in a single lineage is independent of population size, this procedure does not require demographic assumptions or data. Although A111T is subject to selection, we assume only that subsequent mutations are neutral. Our approach to dating selective sweeps differs from that used by Rozas et al. (2001) and Meiklejohn et al. (2004), which counts the number of affected sites and underestimates the coalescence time unless each sampled lineage is independent. In contrast, our estimate is unbiased in the presence of nonindependent samples. Because the most frequent C11 + D4 variant carrying an additional mutation occurs 36 times in 1013 chromosomes, the independence condition is clearly not met.
Diminished variation in the genomic region around SLC24A5 in the HapMap CEU (European ancestry) sample led us to ask what the haplotypes associated with the A111T allele looked like, and how, when, and where they might have arisen. We therefore investigated haplotypes spanning this genomic region (Figure 2). The haplotypes are described in the context of four contiguous subregions defined by blocks of linkage disequilibrium, here designated A (49 kb), B (20 kb), C (78 kb), and D (49 kb) (Figure 2). Blocks B, C, and D together encompass the region of diminished variation in CEU. Analysis of the core subregion C, which includes SLC24A5, yielded 46 haplotypes in HapMap Phase 3 populations (Table S3 and Table S4). The 11 haplotypes with individual abundances >0.5%, which we designate C1 through C11, collectively comprise 93–98% of the total in each population (Figure 3, Table 1, and Table 2). A single haplotype, C11, accounts for 97% of all instances of the A111T variant of SLC24A5. Most of the haplotypes with frequencies <0.5% appear to be products of recombination between more frequent haplotypes. Analysis of common haplotypes found in HGDP and other samples (Li et al. 2008; Behar et al. 2010) yielded results matching those derived from HapMap samples (Table S5 and Table S6), including the equivalence between haplotype C11 and the derived allele of rs1426654. This finding is consistent with a common origin for A111T worldwide. Analysis of data from the 1000 Genomes Project indicated that haplotypes defined on the basis of 16 SNPs corresponded to sets of closely related haplotypes (Figure 4 and File S1).
Haplotype C3 and C11 share the derived allele of SNP c1 (Table 1), suggesting the possibility of recombination. To test this possibility, we examined SNPs not genotyped in HapMap Phase 3. Haplotype C11 carries ancestral alleles of SNPs rs12441154 and rs57108441, whereas C10 carries the derived alleles, a pattern readily explained by a single crossover between C3 and C10 (Figure 6). In support of this notion, the B-region haplotype found associated with C11, B6, is also the one most commonly associated with C3; conversely 96% of C10 haplotypes are associated with B region haplotypes other than B6 (68% with B2; File S2).
Can we determine whether recombination involving C3 preceded or followed the mutation that created A111T? Models in which the recombination or mutation occurred first produce the same end product but proceed through different intermediates, corresponding to C26 or C22, respectively (Table S3). Rare haplotypes matching both potential C11 precursors were found. Because either could have been produced by recombination subsequent to the origin of C11, their occasional occurrence is not informative. However, an evolutionary argument strongly suggests an order of events (Figure S4). If the crossover predated the mutation, the predicted intermediate (C26) would not have experienced positive selection on the basis of lighter pigmentation (Figure S4A). Selection for decreased skin pigmentation would cause the predominant haplotype containing A111T to be C11, as is observed. On the other hand, if the A111T mutation preceded the crossover, the intermediate haplotype (C22) would be predicted to experience the same selective pressure as C11 (Figure S4B). Because C11 is derived by recombination between C3 and C22 in this model, C22 would be expected to predominate over C11, unless C11 had a selective advantage over C22. This outcome is not what is observed. Rather, the frequency of C22 is only approximately 1% that of C11. Furthermore, association with diverse B-region haplotypes rather than one makes it most likely that the existing instances of C22 are the result of recombination after the formation of C11 rather than relicts of a precursor to C11. We conclude that the crossover most likely preceded the A111T mutation.
The preceding analysis is consistent with a wide range of possible dates for the origin of A111T, including the period before the initial colonization of Europe by anatomically modern humans >40 thousand years ago (kya) (Mellars 2006). An estimate for the date of origin of A111T based on microsatellites (Beleza et al. 2012) places the origin at 19 kya (95% confidence interval 6−38 kya), for a dominant model, or 11 kya (95% confidence interval 1−56 kya), for a more plausible additive model. To create an independent estimate, we applied a molecular clock approach to 1000 Genomes data by using the combined C and D subregions. Because proportions of different classes of nucleotide substitutions in the C11 + D4 variants and in the human-chimpanzee alignment are not significantly different (χ2 = 4.42, df = 5, P = 0.49; Table S15), we combined these classes for analysis. For the combined population samples, before making corrections for undercounts in the source data, we obtained an estimate of 7.8 kya for the most recent common ancestor of the C11 + D4 haplotype combination (Table 3). Corresponding 95% confidence limits are 4.8−12.2 kya, whereas uncorrected estimates derived from individual European samples or the combined New World samples (also of European origin) ranged from 5.2 to 10.4 kya (Table 3).
Divergent natural selection caused by differences in solar exposure has resulted in distinctive variations in skin color between human populations. The derived light skin color allele of the SLC24A5 gene, A111T, predominates in populations of Western Eurasian ancestry. To gain insight into when and where this mutation arose, we defined common haplotypes in the genomic region around SLC24A5 across diverse human populations and deduced phylogenetic relationships between them. Virtually all chromosomes carrying the A111T allele share a single 78-kb haplotype that we call C11, indicating that all instances of this mutation in human populations share a common origin. The C11 haplotype was most likely created by a crossover between two haplotypes, followed by the A111T mutation. The two parental precursor haplotypes are found from East Asia to the Americas but are nearly absent in Africa. The distributions of C11 and its parental haplotypes make it most likely that these two last steps occurred between the Middle East and the Indian subcontinent, with the A111T mutation occurring after the split between Europeans and East Asians.
To estimate the age of the A111T mutation, we used a molecular clock approach. We first determined the rate of mutation in the combined C and D regions from number of differences between human and chimpanzee reference sequences. In this alignment, nt changes within 4 nt of gaps were excluded to remove potential biases caused by misalignment. For calibration, we used 6 million years, the midpoint of the range of estimates (5−7 million years) for the divergence time between human and chimpanzee as identified by Kumar et al. (2005) and assumed equal mutation rates (per year) in the human and chimpanzee lineages. We then counted the number of single-nucleotide differences from the modal haplotype for each C11-D4−containing chromosome in the 1000 Genomes dataset by using all reported variants. Each chromosome provides an imprecise estimate of the time since the origin of the haplotype; values were averaged over individual populations, or the entire sample. Because accumulation of mutations in a single lineage is independent of population size, this procedure does not require demographic assumptions or data. Although A111T is subject to selection, we assume only that subsequent mutations are neutral. Our approach to dating selective sweeps differs from that used by Rozas et al. (2001) and Meiklejohn et al. (2004), which counts the number of affected sites and underestimates the coalescence time unless each sampled lineage is independent. In contrast, our estimate is unbiased in the presence of nonindependent samples. Because the most frequent C11 + D4 variant carrying an additional mutation occurs 36 times in 1013 chromosomes, the independence condition is clearly not met.
Diminished variation in the genomic region around SLC24A5 in the HapMap CEU (European ancestry) sample led us to ask what the haplotypes associated with the A111T allele looked like, and how, when, and where they might have arisen. We therefore investigated haplotypes spanning this genomic region (Figure 2). The haplotypes are described in the context of four contiguous subregions defined by blocks of linkage disequilibrium, here designated A (49 kb), B (20 kb), C (78 kb), and D (49 kb) (Figure 2). Blocks B, C, and D together encompass the region of diminished variation in CEU. Analysis of the core subregion C, which includes SLC24A5, yielded 46 haplotypes in HapMap Phase 3 populations (Table S3 and Table S4). The 11 haplotypes with individual abundances >0.5%, which we designate C1 through C11, collectively comprise 93–98% of the total in each population (Figure 3, Table 1, and Table 2). A single haplotype, C11, accounts for 97% of all instances of the A111T variant of SLC24A5. Most of the haplotypes with frequencies <0.5% appear to be products of recombination between more frequent haplotypes. Analysis of common haplotypes found in HGDP and other samples (Li et al. 2008; Behar et al. 2010) yielded results matching those derived from HapMap samples (Table S5 and Table S6), including the equivalence between haplotype C11 and the derived allele of rs1426654. This finding is consistent with a common origin for A111T worldwide. Analysis of data from the 1000 Genomes Project indicated that haplotypes defined on the basis of 16 SNPs corresponded to sets of closely related haplotypes (Figure 4 and File S1). Common haplotypes defined using 1000 Genomes data differed from the ancestral state at 27−48 positions. The number of haplotypes detected depends on the number of polymorphisms used to define them; with inclusion of lower frequency variants, an increasing fraction of chromosomes corresponds to rare haplotypes (Table S7).
Phylogenetic relationships between haplotypes determined using 1000 Genomes data (Figure 4) were equivalent to those deduced from HapMap phase 2 data. The branches comprising C1, C2, C3, C4, and C5-C11 are early diverging clades. The most abundant and extensive lineage includes three branches: C5, C6-C7, and C9-C11. Within the C9-C11 branch, the C9 cluster is ancestral to C10, but nearly all instances of C9 include additional polymorphisms that distinguish them from common ancestors with C10. In contrast, the most commonly observed C10 variant is ancestral to C11.
Haplotype C3 and C11 share the derived allele of SNP c1 (Table 1), suggesting the possibility of recombination. To test this possibility, we examined SNPs not genotyped in HapMap Phase 3. Haplotype C11 carries ancestral alleles of SNPs rs12441154 and rs57108441, whereas C10 carries the derived alleles, a pattern readily explained by a single crossover between C3 and C10 (Figure 6). In support of this notion, the B-region haplotype found associated with C11, B6, is also the one most commonly associated with C3; conversely 96% of C10 haplotypes are associated with B region haplotypes other than B6 (68% with B2; File S2).
Can we determine whether recombination involving C3 preceded or followed the mutation that created A111T? Models in which the recombination or mutation occurred first produce the same end product but proceed through different intermediates, corresponding to C26 or C22, respectively (Table S3). Rare haplotypes matching both potential C11 precursors were found. Because either could have been produced by recombination subsequent to the origin of C11, their occasional occurrence is not informative. However, an evolutionary argument strongly suggests an order of events (Figure S4). If the crossover predated the mutation, the predicted intermediate (C26) would not have experienced positive selection on the basis of lighter pigmentation (Figure S4A). Selection for decreased skin pigmentation would cause the predominant haplotype containing A111T to be C11, as is observed. On the other hand, if the A111T mutation preceded the crossover, the intermediate haplotype (C22) would be predicted to experience the same selective pressure as C11 (Figure S4B). Because C11 is derived by recombination between C3 and C22 in this model, C22 would be expected to predominate over C11, unless C11 had a selective advantage over C22. This outcome is not what is observed. Rather, the frequency of C22 is only approximately 1% that of C11. Furthermore, association with diverse B-region haplotypes rather than one makes it most likely that the existing instances of C22 are the result of recombination after the formation of C11 rather than relicts of a precursor to C11. We conclude that the crossover most likely preceded the A111T mutation.
The world distribution of core region haplotypes, together with their phylogenetic relationships, suggests which haplotypes likely originated in Africa and which most likely arose outside of Africa. As expected from the near fixation of A111T in Europe, the C11 clade predominates there, and all other haplotypes are rare. Of the remaining 10 common core haplotype groups, all ancestral at rs1426654, eight clearly have their origins in Africa (Figure 3B, Figure 4, and Table S4). Three early diverging haplotypes, C1, C2, and C4, are rare outside of Africa and clearly originated there. In the lineage containing the majority of haplotypes, each of the three branches, containing C5, C6-C7, and C8-C11, give strong evidence of having originated in Africa. C5 reaches its greatest abundance in West Africa and is rare outside of Africa. Within the other two branches, C6 and C9, which are the most common haplotypes in Africa, are also common worldwide, whereas C7 is abundant in East Asia and much less common but widespread in Africa. Consideration of the relationships among haplotype variants (Figure 4) indicates that C6, C7, and C9 (but not C8) dispersed out of Africa and have diverse descendants present and originating in East Asia. Among these descendants is C10, which is abundant in East Asia (and the New World) but extremely rare in Africa (0.5% in LWK). Haplotype C3 represents the final early diverging lineage (Figure 4). Although the lineage containing this haplotype must have originated in Africa, C3 is rare in Africa (1.0% in MKK) but widely distributed in East Asia, the New World, and Oceania. The distributions of C3 and C10 are most consistent with origin outside of Africa and subsequent introduction into Africa by migrations such as those documented by uniparental markers (Richards et al. 2006).
The preceding analysis is consistent with a wide range of possible dates for the origin of A111T, including the period before the initial colonization of Europe by anatomically modern humans >40 thousand years ago (kya) (Mellars 2006). An estimate for the date of origin of A111T based on microsatellites (Beleza et al. 2012) places the origin at 19 kya (95% confidence interval 6−38 kya), for a dominant model, or 11 kya (95% confidence interval 1−56 kya), for a more plausible additive model. To create an independent estimate, we applied a molecular clock approach to 1000 Genomes data by using the combined C and D subregions. Because proportions of different classes of nucleotide substitutions in the C11 + D4 variants and in the human-chimpanzee alignment are not significantly different (χ2 = 4.42, df = 5, P = 0.49; Table S15), we combined these classes for analysis. For the combined population samples, before making corrections for undercounts in the source data, we obtained an estimate of 7.8 kya for the most recent common ancestor of the C11 + D4 haplotype combination (Table 3). Corresponding 95% confidence limits are 4.8−12.2 kya, whereas uncorrected estimates derived from individual European samples or the combined New World samples (also of European origin) ranged from 5.2 to 10.4 kya (Table 3). These values are clearly underestimates as a result of low sequence depth (1000 Genomes Project Consortium 2012). Adjustment for undercounting is substantial, increasing the estimated age for the combined samples to 12.4 (95% confidence interval 7.6−19.2) kya. If mutation rates in recent humans are lower than predicted from the human-chimpanzee divergence (Scally and Durbin 2012), true ages will be even older. Our adjusted dates overlap those previously reported (Beleza et al. 2012) and are also consistent with the lower limit for the origin of A111T set by the finding that the Alpine “iceman” dated to 5.3 kya was homozygous for this variant (Keller et al. 2012). This date range implies an origin clearly preceding the Neolithic transition in Europe. These dates are later than the initial colonization of Europe but are consistent with an A111T origin before or after post-glacial population expansions.
The precursors to C11, haplotypes C3 and C10, are common in East Asia and the New World (Figure S5), but the distribution of C11 indicates that these locations are not likely sites for the origin of C11 or its immediate precursor. Similarly, B6 not associated with C11 is distributed widely in East Asia and the New World (data not shown). The paucity of C3 and C10 among existing African haplotypes suggests that both events leading to the origin of C11 took place outside this continent. Our dating for this haplotype is consistent with a non-African origin. The most likely location for the origin of C11 is, therefore, within the region in which it is fixed or nearly so. As both models for the origin of C11 imply that C3 and C10 were present in ancestors of Europeans, the observed and inferred distributions of these autosomal haplotypes are consistent with the single-out-of-Africa hypothesis derived using uniparental markers. Although a non-African origin for C11 is clear, near fixation of this haplotype over a wide geographical region prevents strong inferences regarding a precise location of origin. Existing data are consistent with a model in which the C11 precursor did not extend outside the geographical region in which C11 is now nearly fixed, a conclusion subject to limited haplotype sampling in some neighboring regions, such as India. With sufficiently strong positive selection for C11, it is possible that this haplotype could have originated anywhere within its current range and spread via local migration. However, selection acting in concert with major population migrations would have facilitated a much more rapid dispersal. Archeological, mitochondrial, and Y-chromosomal data suggest involvement of multiple dispersals in shaping the current populations of Europe and the Middle East (Soares et al. 2010).
Canfield, Victor A., et al. "Molecular Phylogeography of a Human Autosomal Skin Color Locus Under Natural Selection." G3: Genes| Genomes| Genetics 3.11 (2013): 2059-2067.