Post by Admin on Jun 12, 2016 22:47:33 GMT
The pigmentation variation in Europeans and East Asians is attributed to specific SNPs in the region of OCA2 and HERC2 (Donnelly et al. 2012). It's unclear if Vitamin D deficiency has something to do with light skin alleles observed in Europeans and East Asians as commonly claimed. The light skin mutation common in East Asia is OCA2 (rs1800414), while OCA2 (rs4778138 ) is highly concentrated in Scandinavia (47-100%). The light skin allele of rs1800414 (C) is found almost exclusively in East and Southeast Asia (up to 76%). The three-SNP haplotype system (rs4778138, rs4778241, and rs7495174) is associated with blue eyes and there is no known biological advantage to having blue eyes. Mutations in OCA2 are known to cause oculocutaneous albinism type 2 and our ancestors may have gone through a process of skin whitening similar to albinism.
Mutations in OCA2 are known to cause oculocutaneous albinism type 2. However, the gene is also known to play a role in variation in normal pigmentation. In European populations, it is primarily associated with blue irises. Several sites in and around OCA2 have been reported to be the functional variant or to be tightly linked to the functional variant leading to blue eyes. These sites include a three-SNP haplotype (rs4778138, rs4778241, rs7495174) and four individual SNPs, rs1129038, rs12913832, rs916977, and rs1667394 (Duffy et al. 2007; Sturm et al. 2008; Kayser et al. 2008; Sulem et al. 2007; Mengel-From et al. 2010; Walsh et al. 2010). Four of the SNPs (rs1129038, rs12913832, rs916977, rs1667394) are actually located in introns of the Hect Domain and RCC1-like Domain 2 (HERC2 [MIM 605837]), which are located 10 Kb upstream of OCA2. These are thought either to be located in or near an upstream regulatory region of OCA2 or to be in linkage disequilibrium (LD) with functional elements in HERC2 and affect a possible HERC2 regulation of OCA2. The actual function of HERC2 is unknown but it shows homology to known E3 ubiquitin-protein ligases. One of the HERC2 SNPs (rs1667394) has been associated with blond hair in Europeans (Sulem et al. 2007). Specific polymorphisms and the haplotypes are illustrated in Fig. 1; all 21 SNPs studied are listed in Table 2. The derived allele of another SNP at OCA2, rs1800407, has been associated with green/hazel eyes in Europeans (Branicki et al. 2009). Rs1800407 is an arginine to glutamine missense mutation (Arg419Gln) found in exon 13 of the OCA2 gene. Sturm et al. (2008) concluded that the derived allele of rs1800407 increased the penetrance of the blue eye phenotype associated with the derived allele of rs12913832.
The derived allele at a missense SNP (rs1800414, His615Arg) in exon 19 of OCA2 has been reported to be specific to East Asia (Yuasa et al. 2007; Anno et al. 2008). Edwards et al. (2010) showed an association between the derived allele of rs1800414 (C, 615Arg) and lighter skin pigmentation in a sample of individuals of East Asian ancestry from Canada and confirmed their results using an independent sample of Han Chinese. Here we present our results on the global distributions of haplotypes and specific SNPs in the region of OCA2 and HERC2, genes that have been implicated in pigmentation variation in Europeans and East Asians. We also examine the LD between the SNPs and haplotypes of interest. Finally, we use long-range haplotype tests to show that OCA2 is or has been under selection in Europe and the derived allele of rs1800414 is, or has been, under selection in East Asia.
Fig. 2
Global frequencies of blue-eye associated haplotypes
The three haplotype systems we define here are shown in Fig. 1 and Table 3. Duffy et al. (2007) previously identified a three-SNP haplotype system (rs4778138, rs4778241, and rs7495174) associated with blue eyes; for the purpose of this paper, we will refer to this system as BEH1, blue-eye associated haplotype #1. The blue-eye associated allele of BEH1 is ACA, the fully derived haplotype. Sturm et al. (2008) reported that rs12913832 is associated with blue eyes. Since rs1129038 is in nearly complete LD with rs12913832 in all populations, we defined these two SNPs as a haplotype system referred to as BEH2, blue-eye associated haplotype #2. The blue-eye associated allele of BEH2 is TG, both derived alleles. In the HGDP populations, BEH2 will consist of rs12913832 only since rs1129038 is not present in that dataset. We also typed an SNP that occurs between rs12913832 and rs1129038; however, it has not been associated with pigmentation, and is monomorphic on the blue-eye associated allele of BEH2 and was therefore not included in BEH2. Two other SNPs, rs916977 and rs1667394, have previously been associated with blue eyes (Kayser et al. 2008; Sulem et al. 2007). In our data, with the exception of a low frequency haplotype in Africa, rs916977 and rs1667394 are in nearly complete LD. Therefore, we treat them as another haplotype system, BEH3, blue-eye associated haplotype #3. The blue-eye associated allele of BEH3 is CA, again the derived haplotype. In the HGDP populations BEH3 will consist of rs1667394 only since rs916977 is not present in the data set.
The distributions of the blue-eye associated alleles at the three haplotyped systems are presented in Fig. 2, each haplotype in contour plots, and all three grouped by population in a histogram. The actual frequencies are presented in supplemental material and in ALFRED. The alleles associated with blue eyes at all three BEH blue-eye associated haplotypes have their highest frequencies in Northwestern Europe, and the TG allele at BEH2 is essentially observed only in Europe; the ACA allele of BEH1 and the CA allele at BEH3 are at their highest frequencies in Europe, particularly in Northern and Western Europe, and have much lower frequencies elsewhere. In most of Central and East Asia, these alleles have frequencies of <20% but reach frequencies of 40% and higher in the Americas.
Fig. 4
Global rs1800414 derived-allele distribution and frequencies.
Our data confirm that the putative light skin allele of rs1800414 (C) is found almost exclusively in East and Southeast Asia, at frequencies ranging from 0 to 76% (Fig. 4) at higher levels in eastern East Asia (62–76.1%) compared with Southeast Asia (0–54.3%) and Western China (15.5–37.5%). Outside of East and Southeast Asia, the C allele is only found in low frequencies in the Adygei, Chuvash, and Hungarians in Europe (>1–3.6%), the Yakut in Siberia (8.8%), and the Micronesians in the Pacific Islands (4.2%).
Fig. 10
Selection results at rs1800414 in East Asia.
In East Asia we see strong evidence for selection at the C allele of rs1800414 using the REHH test in both the constant population size model (Fig. 10a, b) and the bottleneck with an expansion model (supplemental Fig. 5). Interestingly, we also get significant REHH values at all three BEHs but the haplotypes that contain the ancestral alleles are the ones showing evidence of selection (supplemental Fig. 6). This result is likely due to the fact that the C allele of rs1800414 occurs on the same chromosome as these haplotypes in East Asia (supplemental Fig. 7). As with our European population samples we divided the East Asians into three groups: Western China, East Asia, and Southeast Asia. We see there is strong evidence of selection for the C allele of rs1800414 in all three population groups (supplemental Fig. 8). In both Western China and Southeast Asia, the frequency of the derived allele of rs1800414 is <50%, so we were able to use the nHS test on these populations. Using the nHS test we see strong evidence of selection for the derived allele of rs1800414 in both the Western China and Southeast Asian groups (Fig. 10d, e).
Global distribution of the light skin allele
We have shown that the C allele of the missense SNP rs1800414 is found almost exclusively in East Asia (Fig. 4). Within East Asia there is a general cline in the frequency of the C allele with the lowest frequencies in Western China, midrange frequencies in Southeast Asia, and high frequencies in Eastern East Asia. The major exception to this pattern is the Malaysians; in our small sample the derived allele is absent, but the Malays are an Austronesian group and they show similar frequencies to our other Austronesian populations (Micronesians and Samoans).
Selection in the OCA2-HERC2 region
We showed that the strongest signal of selection in Europe and Southwest Asia is at the TG allele of BEH2 and any signal seen at BEH1 and BEH3 is likely due to hitchhiking (Figs. 8, ,9).9). Along with the distribution data, this strongly suggests that the TG allele of BEH2 is, contains, or is in strong LD with the blue eye causal mutation. It is possible that BEH2 is in the promoter region of OCA2 and the blue eye allele lowers the amount of OCA2 expressed either in the iris or globally.
This result also raises the question of why blue eyes would be under selection. Since there is no known biological advantage to having blue eyes, we think a likely answer is sexual selection that in Europe and Southwest Asia individuals with blue eyes are, or were, preferred as mates. Another possible explanation is that the blue eye phenotype is not being selected for; rather the TG allele of BEH2 has another phenotype, such as lighter skin pigmentation, which is under selection.
In East Asia, we show that the C allele of the missense SNP rs1800414 is also under selection (Fig. 10). Again this result is not completely unexpected since this allele has been associated with lighter skin pigmentation in East Asians, and variants affecting skin pigmentation have previously been shown to be targets of selection (Edwards et al. 2010; Izagirre et al. 2006; Lao et al. 2007; Norton et al. 2007).
Fig. 5
LD at OCA2 and HERC2. This figure shows the LD in the OCA2/HERC2 region in 55 populations. SNPs 1–21 are ordered as in Table 2. A region of high LD is represented by red arrows using the default parameters in the agglomerative algorithm in HAPLOT (Gu et al. 2005): A region of high LD starts at r2 = 0.4 and is extended as long as the average r2 ≥ 0.3. The minimum r2 for inclusion in a block is 0.1. If LD cannot be calculated for a SNP (e.g., it is fixed in that particular population), then a white space in the arrow is shown. On average, there are two regions of high LD, one near the East Asian “light skin” SNP (rs1800414) and one in the BEH region. The smallest regions are in Africa whereas the largest regions are in East Asia. In the Americas, there are three regions, one near rs1800414, one at BEH1, and one at BEH3
Haplotypes and LD
We calculated pairwise r2 for all 21 SNPs and illustrate regions of high LD using the HAPLOT program (Fig. 5). On average, globally we see two regions of high LD, though the sizes of each of these regions vary by population group. In Africa, the first region encompasses SNP 4 (rs12914687) through SNP 7 (rs2015343) and the second region encompasses SNP 16 (rs7494942) through SNP 21 (rs1667394). In Southwest Asia and Europe, both high LD regions are larger and the first is composed of SNP 3 (rs11074314) through SNP 8 (rs4778136), and the second is composed of SNP 12 (rs4778138) through SNP 21 (rs1667394). In Central Asia and the Pacific, the first region is the same as in Africa and the second region is the same as in Southwest Asia and Europe. In East Asia, the first high LD region extends from SNP 2 (rs1800414) to SNP 9 (rs746861) and the second region extends from SNP 10 (rs7170869) to SNP 21 (rs1667394). We actually see three regions of high LD in Native Americans, the first from SNP 3 (rs11074314) to SNP 8 (rs4778136), the second from SNP 9 (rs746861) to SNP 12 (rs4778138), and the third from SNP 18 (rs3935591) through SNP 21 (rs1667394). In Europe, the second region covers all three BEHs, and in East Asia, the first region includes rs1800414.
Since the blue-eye associated alleles at all three BEHs are concordant in Europe and fall into that same high LD region in Europe, we analyzed the haplotypes of all seven SNPs together (Fig. 6). In this data set, we see that the TG allele BEH2 always occurs on chromosomes that have the CA allele of BEH3 and almost always occurs on chromosomes with the ACA allele of BEH1. The ACA allele of BEH1 and the CA allele of BEH3 also usually occur on the same chromosomes; however, outside of Northwestern and Eastern Europe they do not always occur on chromosomes with the TG allele of BEH2. Whenever one of the blue-eye associated alleles does occur on a chromosome by itself, it is most likely to be the CA allele of BEH3.
Fig. 6
Haplotypes of the three BEHs. This figure shows the three BEHs as a single haplotyped system. The TG allele of BEH2 always occurs with the CA allele of BEH3 and usually occurs with the ACA allele of BEH1 (yellow). The CA BEH3 and ACA BEH1 alleles, however, do not always occur with the TG allele of BEH2. The ACA BEH1 allele and the CA BEH3 allele also usually occur together (pink and yellow)
We also looked at the haplotypes of the seven SNPs that compose the first high LD region in East Asians with respect to the derived allele of rs1800414 (Fig. 7). Here we see the derived allele of rs1800414 occurs on three haplotypes, though a vast majority occurs on a single haplotype (CACCACT). Of the remaining two haplotypes containing the derived allele of rs1800414, one differs from the most common haplotype at the last site and the other differs at the final four sites.
Fig. 7
Haplotypes containing the derived allele of rs1800414 in East Asians. This figure shows a seven-SNP haplotype in the “light skin” region of OCA2 in East Asians. The seven SNPs were chosen based on the first region of high LD in East Asians from Fig. 4. The C allele of rs1800414 is seen on three haplotypes, one of which (blue) accounts for a large majority of the chromosomes. The next most common haplotype (red) differs from the most common (blue) only at the seventh site. The least common (yellow) differs from the blue at the final four sites
Distribution of blue-eye associated alleles
The frequencies of the haplotypes associated with blue eyes of the three blue-eye associated haplotypes in the OCA2 and HERC2 genes are very similar in Northwestern and Eastern Europe where all three haplotypes have their highest frequencies (Fig. 2). This also holds true for homozygotes of the blue-eye associated alleles of these haplotypes (Supplemental Fig. 11). All three blue-eye associated alleles and homozygotes of these alleles are also present in Southern Europe and Southwest Asia at lower frequencies than those found in Northwestern and Eastern Europe; however, the frequencies of the TG allele of BEH2 and its homozygotes are lower than those of the ACA allele of BEH1 and the CA allele of BEH3. Outside of Europe, the blue-eye associated alleles of BEH1 and BEH2 are still common and homozygotes of these alleles are still seen but the blue-eye associated allele of BEH2 is much rarer and blue-eye associated homozygotes are virtually unseen.
Given the strong LD in Europe across all three haplotype systems, their association with the blue eye phenotype in Europe is understandable. However, these frequency data for other populations around the world and the essential restriction of blue eyes to Europe, shows that the BEH1 and BEH3 haplotype systems, and the composing SNPs are not universal markers of blue eyes. The TG allele at BEH2 is the best marker for blue eyes and may even contain the causal allele though the actual causative variant could be anywhere in the region of strong LD seen in European populations.
Hum Genet. 2012 May; 131(5): 683–696.
Mutations in OCA2 are known to cause oculocutaneous albinism type 2. However, the gene is also known to play a role in variation in normal pigmentation. In European populations, it is primarily associated with blue irises. Several sites in and around OCA2 have been reported to be the functional variant or to be tightly linked to the functional variant leading to blue eyes. These sites include a three-SNP haplotype (rs4778138, rs4778241, rs7495174) and four individual SNPs, rs1129038, rs12913832, rs916977, and rs1667394 (Duffy et al. 2007; Sturm et al. 2008; Kayser et al. 2008; Sulem et al. 2007; Mengel-From et al. 2010; Walsh et al. 2010). Four of the SNPs (rs1129038, rs12913832, rs916977, rs1667394) are actually located in introns of the Hect Domain and RCC1-like Domain 2 (HERC2 [MIM 605837]), which are located 10 Kb upstream of OCA2. These are thought either to be located in or near an upstream regulatory region of OCA2 or to be in linkage disequilibrium (LD) with functional elements in HERC2 and affect a possible HERC2 regulation of OCA2. The actual function of HERC2 is unknown but it shows homology to known E3 ubiquitin-protein ligases. One of the HERC2 SNPs (rs1667394) has been associated with blond hair in Europeans (Sulem et al. 2007). Specific polymorphisms and the haplotypes are illustrated in Fig. 1; all 21 SNPs studied are listed in Table 2. The derived allele of another SNP at OCA2, rs1800407, has been associated with green/hazel eyes in Europeans (Branicki et al. 2009). Rs1800407 is an arginine to glutamine missense mutation (Arg419Gln) found in exon 13 of the OCA2 gene. Sturm et al. (2008) concluded that the derived allele of rs1800407 increased the penetrance of the blue eye phenotype associated with the derived allele of rs12913832.
The derived allele at a missense SNP (rs1800414, His615Arg) in exon 19 of OCA2 has been reported to be specific to East Asia (Yuasa et al. 2007; Anno et al. 2008). Edwards et al. (2010) showed an association between the derived allele of rs1800414 (C, 615Arg) and lighter skin pigmentation in a sample of individuals of East Asian ancestry from Canada and confirmed their results using an independent sample of Han Chinese. Here we present our results on the global distributions of haplotypes and specific SNPs in the region of OCA2 and HERC2, genes that have been implicated in pigmentation variation in Europeans and East Asians. We also examine the LD between the SNPs and haplotypes of interest. Finally, we use long-range haplotype tests to show that OCA2 is or has been under selection in Europe and the derived allele of rs1800414 is, or has been, under selection in East Asia.
Fig. 2
Global frequencies of blue-eye associated haplotypes
The three haplotype systems we define here are shown in Fig. 1 and Table 3. Duffy et al. (2007) previously identified a three-SNP haplotype system (rs4778138, rs4778241, and rs7495174) associated with blue eyes; for the purpose of this paper, we will refer to this system as BEH1, blue-eye associated haplotype #1. The blue-eye associated allele of BEH1 is ACA, the fully derived haplotype. Sturm et al. (2008) reported that rs12913832 is associated with blue eyes. Since rs1129038 is in nearly complete LD with rs12913832 in all populations, we defined these two SNPs as a haplotype system referred to as BEH2, blue-eye associated haplotype #2. The blue-eye associated allele of BEH2 is TG, both derived alleles. In the HGDP populations, BEH2 will consist of rs12913832 only since rs1129038 is not present in that dataset. We also typed an SNP that occurs between rs12913832 and rs1129038; however, it has not been associated with pigmentation, and is monomorphic on the blue-eye associated allele of BEH2 and was therefore not included in BEH2. Two other SNPs, rs916977 and rs1667394, have previously been associated with blue eyes (Kayser et al. 2008; Sulem et al. 2007). In our data, with the exception of a low frequency haplotype in Africa, rs916977 and rs1667394 are in nearly complete LD. Therefore, we treat them as another haplotype system, BEH3, blue-eye associated haplotype #3. The blue-eye associated allele of BEH3 is CA, again the derived haplotype. In the HGDP populations BEH3 will consist of rs1667394 only since rs916977 is not present in the data set.
The distributions of the blue-eye associated alleles at the three haplotyped systems are presented in Fig. 2, each haplotype in contour plots, and all three grouped by population in a histogram. The actual frequencies are presented in supplemental material and in ALFRED. The alleles associated with blue eyes at all three BEH blue-eye associated haplotypes have their highest frequencies in Northwestern Europe, and the TG allele at BEH2 is essentially observed only in Europe; the ACA allele of BEH1 and the CA allele at BEH3 are at their highest frequencies in Europe, particularly in Northern and Western Europe, and have much lower frequencies elsewhere. In most of Central and East Asia, these alleles have frequencies of <20% but reach frequencies of 40% and higher in the Americas.
Fig. 4
Global rs1800414 derived-allele distribution and frequencies.
Our data confirm that the putative light skin allele of rs1800414 (C) is found almost exclusively in East and Southeast Asia, at frequencies ranging from 0 to 76% (Fig. 4) at higher levels in eastern East Asia (62–76.1%) compared with Southeast Asia (0–54.3%) and Western China (15.5–37.5%). Outside of East and Southeast Asia, the C allele is only found in low frequencies in the Adygei, Chuvash, and Hungarians in Europe (>1–3.6%), the Yakut in Siberia (8.8%), and the Micronesians in the Pacific Islands (4.2%).
Fig. 10
Selection results at rs1800414 in East Asia.
In East Asia we see strong evidence for selection at the C allele of rs1800414 using the REHH test in both the constant population size model (Fig. 10a, b) and the bottleneck with an expansion model (supplemental Fig. 5). Interestingly, we also get significant REHH values at all three BEHs but the haplotypes that contain the ancestral alleles are the ones showing evidence of selection (supplemental Fig. 6). This result is likely due to the fact that the C allele of rs1800414 occurs on the same chromosome as these haplotypes in East Asia (supplemental Fig. 7). As with our European population samples we divided the East Asians into three groups: Western China, East Asia, and Southeast Asia. We see there is strong evidence of selection for the C allele of rs1800414 in all three population groups (supplemental Fig. 8). In both Western China and Southeast Asia, the frequency of the derived allele of rs1800414 is <50%, so we were able to use the nHS test on these populations. Using the nHS test we see strong evidence of selection for the derived allele of rs1800414 in both the Western China and Southeast Asian groups (Fig. 10d, e).
Global distribution of the light skin allele
We have shown that the C allele of the missense SNP rs1800414 is found almost exclusively in East Asia (Fig. 4). Within East Asia there is a general cline in the frequency of the C allele with the lowest frequencies in Western China, midrange frequencies in Southeast Asia, and high frequencies in Eastern East Asia. The major exception to this pattern is the Malaysians; in our small sample the derived allele is absent, but the Malays are an Austronesian group and they show similar frequencies to our other Austronesian populations (Micronesians and Samoans).
Selection in the OCA2-HERC2 region
We showed that the strongest signal of selection in Europe and Southwest Asia is at the TG allele of BEH2 and any signal seen at BEH1 and BEH3 is likely due to hitchhiking (Figs. 8, ,9).9). Along with the distribution data, this strongly suggests that the TG allele of BEH2 is, contains, or is in strong LD with the blue eye causal mutation. It is possible that BEH2 is in the promoter region of OCA2 and the blue eye allele lowers the amount of OCA2 expressed either in the iris or globally.
This result also raises the question of why blue eyes would be under selection. Since there is no known biological advantage to having blue eyes, we think a likely answer is sexual selection that in Europe and Southwest Asia individuals with blue eyes are, or were, preferred as mates. Another possible explanation is that the blue eye phenotype is not being selected for; rather the TG allele of BEH2 has another phenotype, such as lighter skin pigmentation, which is under selection.
In East Asia, we show that the C allele of the missense SNP rs1800414 is also under selection (Fig. 10). Again this result is not completely unexpected since this allele has been associated with lighter skin pigmentation in East Asians, and variants affecting skin pigmentation have previously been shown to be targets of selection (Edwards et al. 2010; Izagirre et al. 2006; Lao et al. 2007; Norton et al. 2007).
Fig. 5
LD at OCA2 and HERC2. This figure shows the LD in the OCA2/HERC2 region in 55 populations. SNPs 1–21 are ordered as in Table 2. A region of high LD is represented by red arrows using the default parameters in the agglomerative algorithm in HAPLOT (Gu et al. 2005): A region of high LD starts at r2 = 0.4 and is extended as long as the average r2 ≥ 0.3. The minimum r2 for inclusion in a block is 0.1. If LD cannot be calculated for a SNP (e.g., it is fixed in that particular population), then a white space in the arrow is shown. On average, there are two regions of high LD, one near the East Asian “light skin” SNP (rs1800414) and one in the BEH region. The smallest regions are in Africa whereas the largest regions are in East Asia. In the Americas, there are three regions, one near rs1800414, one at BEH1, and one at BEH3
Haplotypes and LD
We calculated pairwise r2 for all 21 SNPs and illustrate regions of high LD using the HAPLOT program (Fig. 5). On average, globally we see two regions of high LD, though the sizes of each of these regions vary by population group. In Africa, the first region encompasses SNP 4 (rs12914687) through SNP 7 (rs2015343) and the second region encompasses SNP 16 (rs7494942) through SNP 21 (rs1667394). In Southwest Asia and Europe, both high LD regions are larger and the first is composed of SNP 3 (rs11074314) through SNP 8 (rs4778136), and the second is composed of SNP 12 (rs4778138) through SNP 21 (rs1667394). In Central Asia and the Pacific, the first region is the same as in Africa and the second region is the same as in Southwest Asia and Europe. In East Asia, the first high LD region extends from SNP 2 (rs1800414) to SNP 9 (rs746861) and the second region extends from SNP 10 (rs7170869) to SNP 21 (rs1667394). We actually see three regions of high LD in Native Americans, the first from SNP 3 (rs11074314) to SNP 8 (rs4778136), the second from SNP 9 (rs746861) to SNP 12 (rs4778138), and the third from SNP 18 (rs3935591) through SNP 21 (rs1667394). In Europe, the second region covers all three BEHs, and in East Asia, the first region includes rs1800414.
Since the blue-eye associated alleles at all three BEHs are concordant in Europe and fall into that same high LD region in Europe, we analyzed the haplotypes of all seven SNPs together (Fig. 6). In this data set, we see that the TG allele BEH2 always occurs on chromosomes that have the CA allele of BEH3 and almost always occurs on chromosomes with the ACA allele of BEH1. The ACA allele of BEH1 and the CA allele of BEH3 also usually occur on the same chromosomes; however, outside of Northwestern and Eastern Europe they do not always occur on chromosomes with the TG allele of BEH2. Whenever one of the blue-eye associated alleles does occur on a chromosome by itself, it is most likely to be the CA allele of BEH3.
Fig. 6
Haplotypes of the three BEHs. This figure shows the three BEHs as a single haplotyped system. The TG allele of BEH2 always occurs with the CA allele of BEH3 and usually occurs with the ACA allele of BEH1 (yellow). The CA BEH3 and ACA BEH1 alleles, however, do not always occur with the TG allele of BEH2. The ACA BEH1 allele and the CA BEH3 allele also usually occur together (pink and yellow)
We also looked at the haplotypes of the seven SNPs that compose the first high LD region in East Asians with respect to the derived allele of rs1800414 (Fig. 7). Here we see the derived allele of rs1800414 occurs on three haplotypes, though a vast majority occurs on a single haplotype (CACCACT). Of the remaining two haplotypes containing the derived allele of rs1800414, one differs from the most common haplotype at the last site and the other differs at the final four sites.
Fig. 7
Haplotypes containing the derived allele of rs1800414 in East Asians. This figure shows a seven-SNP haplotype in the “light skin” region of OCA2 in East Asians. The seven SNPs were chosen based on the first region of high LD in East Asians from Fig. 4. The C allele of rs1800414 is seen on three haplotypes, one of which (blue) accounts for a large majority of the chromosomes. The next most common haplotype (red) differs from the most common (blue) only at the seventh site. The least common (yellow) differs from the blue at the final four sites
Distribution of blue-eye associated alleles
The frequencies of the haplotypes associated with blue eyes of the three blue-eye associated haplotypes in the OCA2 and HERC2 genes are very similar in Northwestern and Eastern Europe where all three haplotypes have their highest frequencies (Fig. 2). This also holds true for homozygotes of the blue-eye associated alleles of these haplotypes (Supplemental Fig. 11). All three blue-eye associated alleles and homozygotes of these alleles are also present in Southern Europe and Southwest Asia at lower frequencies than those found in Northwestern and Eastern Europe; however, the frequencies of the TG allele of BEH2 and its homozygotes are lower than those of the ACA allele of BEH1 and the CA allele of BEH3. Outside of Europe, the blue-eye associated alleles of BEH1 and BEH2 are still common and homozygotes of these alleles are still seen but the blue-eye associated allele of BEH2 is much rarer and blue-eye associated homozygotes are virtually unseen.
Given the strong LD in Europe across all three haplotype systems, their association with the blue eye phenotype in Europe is understandable. However, these frequency data for other populations around the world and the essential restriction of blue eyes to Europe, shows that the BEH1 and BEH3 haplotype systems, and the composing SNPs are not universal markers of blue eyes. The TG allele at BEH2 is the best marker for blue eyes and may even contain the causal allele though the actual causative variant could be anywhere in the region of strong LD seen in European populations.
Hum Genet. 2012 May; 131(5): 683–696.