Genetic History of Tibetan Highlanders

new

Admin
Administrator

Posts: 73,561

Genetic History of Tibetan Highlanders Mar 19, 2022 20:27:33 GMT

Quote

Post by Admin on Mar 19, 2022 20:27:33 GMT

A Two-Wave “Admixture of Admixture” Model
Our data support that the Tibetan genome appears to have arisen from a mixture of multiple ancestral gene pools, but the ancestral composition is much more complicated, and its history can be traced back considerably earlier than previously suspected. For instance, a recent study has suggested that “Tibetans are a mixture of ancestral populations related to the Sherpa and Han Chinese.”56 We propose a two-wave “admixture of admixture” (AoA) model, despite its simplicity, to help explain the ancestral makeup and pre-history of Tibetans and Sherpas (Figure 9). An ancient wave of admixture occurred in a pre-LGM era >40,000 YBP, which could have resulted in the unique mosaic pattern of Paleolithic ancestries observed in the Tibetan genomes. The ancient gene pool of the Tibetans originated from an ancient admixed population, SUNDer, which was a group of hybrids of ancient Siberians (modern humans) and several archaic populations—including Denisovan-like, Neanderthal-like, and most likely a few unknown non-modern human groups that currently have not been identified by archeological or genetic studies. The admixture events that eventually formed SUNDer could have occurred on the Tibetan Plateau or in lowland areas before the SUNDer arrived at the plateau at least ∼40,000 YBP, before the LGM (∼26,500–19,000 YBP).49 Between ∼40,000 and ∼15,000 YBP, few new migrations occurred between the lowland and the plateau as a result of the LGM. However, from about ∼15,000 to ∼9,000 YBP (Figure 4), many more migrations to the plateau from the lowland included modern human ancestry. Therefore, another more recent wave of admixture occurred between Paleolithic and Neolithic ancestries, which probably resulted from a post-LGM migration to the plateau, most likely a population split from the common ancestor of Tibetans and Han Chinese. The divergence of Tibetans and Sherpas occurred ∼11,000 to ∼7,000 YBP (Figure 4), no earlier than the divergence between Tibetans and Han Chinese, and thus does not support that Tibetans are a mixture of Sherpa and Han Chinese.

Figure 9
A Sketch Map for the Origins and Demographic History of Sherpas and Tibetans

This simplified model of the origins and evolutionary history of Sherpas and Tibetans is based on the observations and estimations from this study. The two dashed lines connecting HAN and TBN and connecting TBN and SHP represent possible gene flow between populations. Abbreviations are as follows: MRCA0, most recent common ancestor of modern human and archaic hominoids;14 MRCA1, most recent common ancestor of Eurasians; MRCA2, most recent common ancestor of HAN and TIB; MRCA3, most recent common ancestor of TBN and SHP; SUNDer: a tentative label for the early settlers who contributed ancient or archaic ancestry to present-day Tibetan highlanders.

Discussion
We provide compelling evidence for the co-existence of both Paleolithic and Neolithic ancestries on a genome-wide scale in the modern Tibetan gene pool, which supports a genetic continuity between pre-LGM highland-foragers and present-day Tibetans. We have explicitly revealed a prevalent non-AMH ancestry of the Paleolithic lineages, significantly advancing our understanding of the genetic prehistory of human colonization in Tibet as suggested by previous Y chromosomal and mtDNA studies.5, 6

The Paleolithic ancestries in the modern Tibetan gene pool entangle Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown archaic sequences, indicating that Tibet remained a human melting pot where interbreeding occurred among different hominine groups before the LGM, although the motivations for prehistoric people to settle at the environmentally inhospitable plateau are still not clear. The results of this study indicate that plateau colonization and the altitudinal adaptation of human beings were considerably earlier and more complicated than had previously been suspected.

Non-AMH ancestries, despite being present in low proportions, composed a substantial part of the Tibetan gene pool and shaped the genetic architecture of present-day Tibetans and Sherpas. However, it is noteworthy that the Neolithic ancestries, which are dominant in contemporary Tibetans, might also contain non-AMH lineages via genetic introgression that occurred in the common ancestors of Tibetans and Han Chinese much earlier than the divergence of the two groups.35, 38, 39 This is why we observed that overall spatial distributions of non-AMH-derived sequences are similar between TIB and HAN. This admixture pattern in Tibetan genomes is very complex but unsurprising given that archaeological data have already suggested an ancient initial occupation of the plateau, followed by multiple migrations at different times and from different places, which have created a complex, mosaic population history.1

Taking advantage of the whole-genome sequence data that we generated simultaneously in both Tibetans and Han Chinese, we estimate that Tibetans diverged from Han Chinese with an average coalescence time of ∼15,000–9,000 years. This estimation is much earlier than 2,750 years ago, estimated by a recent study.4 Our estimation is less likely to be affected by the archaic sequences harbored in the whole-genome data, which could potentially confound the estimation of population divergence, because the non-AMH sequences were excluded from the genomic data of both Tibetans and Han Chinese in this analysis. Therefore, this time estimation largely reflects the divergence of modern human ancestry in Tibetans and Han Chinese since the two populations split from their shared ancestral population. However, subsequent gene flows from other populations and between the two populations are expected to influence the estimation of population divergence, which we were not able to fully evaluate and control here. Further efforts are also needed to elucidate the genetic relationship between Tibetans and Sherpas; for instance, the reported relationship by recent studies between Sherpa and Tibetan groups are controversial.56, 57, 58 Even though we observed some degree of differentiation between the two groups, uncovering their population structure and inferring their demographic history will require larger sample sizes.

Admin
Administrator

Posts: 73,561

Genetic History of Tibetan Highlanders Mar 20, 2022 19:22:12 GMT

Quote

Post by Admin on Mar 20, 2022 19:22:12 GMT

Genetic signatures of high-altitude adaptation in Tibetans

Significance
The origin of Tibetans and the mechanism of how they adapted to the high-altitude environment remain mostly unknown. We conduct the largest genome-wide study in Tibetans to date. We detect signatures of natural selection at nine gene loci, two of which are strongly associated with blood phenotypes in present day Tibetans. We further show the genetic relatedness of Tibetans with other ethnic groups in China and estimate the divergence time between Tibetans and Han. These findings provide important knowledge to understand the genetic ancestry of Tibetans and the genetic basis of high-altitude adaptation.

Abstract
Indigenous Tibetan people have lived on the Tibetan Plateau for millennia. There is a long-standing question about the genetic basis of high-altitude adaptation in Tibetans. We conduct a genome-wide study of 7.3 million genotyped and imputed SNPs of 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian ancestry. Using this large dataset, we detect signals of high-altitude adaptation at nine genomic loci, of which seven are unique. The alleles under natural selection at two of these loci [methylenetetrahydrofolate reductase (MTHFR) and EPAS1] are strongly associated with blood-related phenotypes, such as hemoglobin, homocysteine, and folate in Tibetans. The folate-increasing allele of rs1801133 at the MTHFR locus has an increased frequency in Tibetans more than expected under a drift model, which is probably a consequence of adaptation to high UV radiation. These findings provide important insights into understanding the genomic consequences of high-altitude adaptation in Tibetans.

Genetic adaptation to a novel environment is a fundamental process for the survival and adaptation of a species. In humans, one of the most recent examples is adaptation to high altitude, such as the Tibetan highlands. The Tibetan Plateau (TP; also known as the Qinghai–Tibet Plateau in China) has an average elevation of ∼4,000 m above sea level, where the oxygen concentration is ∼40% lower (1) and UV radiation is ∼30% stronger (2) than at sea level. The indigenous Tibetan people have developed a distinctive set of physiological characteristics to adapt to the extreme environmental conditions in the highlands (1). Previous population-based genetic studies have reported evidence that genetic variants at the EPAS1 and EGLN1 loci have been under positive natural selection (3–7). These genetic variants are associated with phenotypic variation of hemoglobin concentration (HGB) in Tibetans (3–5). The EPAS1 gene, which encodes the hypoxia inducible factor-2α (HIF-2α) subunit of HIF complex, is a transcription factor involved in body response to hypoxia (8, 9). EGLN1 encodes PHD2, which is a major oxygen-dependent negative regulator of HIFs (10, 11). Apart from these two known genes that have biological relevance to hypoxia adaptation (3–7, 12), several other candidate gene loci (e.g., PPARA and HBB) have been highlighted in recent studies (3, 4, 13–15). Genetic adaptation to high altitude, however, is likely to be a complex process, with a large number of genes involved in response to not only hypoxia but also, other extreme environmental conditions, such as low temperature, high UV radiation, and insufficient food supply. If the strength of natural selection at these gene loci has been small to moderate, these loci would not be detected in previous studies (3–7) of small sample size (typically n < 150). In this study, we perform a large-scale genome-wide study to detect genetic signals of high-altitude adaptation in 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian (EAS) ancestry. Using this large dataset, we identify signals of genetic adaptation.

Admin
Administrator

Posts: 73,561

Genetic History of Tibetan Highlanders Mar 20, 2022 20:31:00 GMT

Quote

Post by Admin on Mar 20, 2022 20:31:00 GMT

Results
Genetic Ancestry of Tibetans.
There were 3,717 subjects collected from two sites (Seda and Litang) in the TP in China (SI Appendix, Fig. S1). We extracted DNA from blood samples and performed genome-wide SNP genotyping assays using the Illumina CoreExome array, an SNP array with 264,909 tag SNPs with genome-wide coverage and 244,593 exome-focused SNPs (Materials and Methods). After standard quality control (QC) filtering of the genotype data, we retained 3,381 subjects and 287,691 SNPs (279,608 on autosomes), most of which were genome-wide tag SNPs.

We performed a principal component analysis (PCA) of the subjects using all 279,608 autosomal SNPs after stringent QCs (Materials and Methods). There was no evidence of population stratification between the cohorts recruited from the two sites (SI Appendix, Fig. S2A), despite the fact that the Seda subjects were recruited from people who came from diverse regions of the TP to study or work at the Seda Larong Wuming Buddhist Institute and the Litang subjects were recruited from nomadic people who have lived in Litang and surrounding areas for many generations (Materials and Methods). We, therefore, combined the two cohorts for analysis. We showed by projecting the principal components (PCs) estimated from our samples on those from the 1000 Genome Projects (1000G) that all of our subjects were of EAS ancestry (SI Appendix, Fig. S2B). On a finer scale, the subjects are stratified along the first PC (SI Appendix, Fig. S2C), consistent with a few hundred self-reported Han in the sample. We classified our subjects into three groups (Tibetans, Han, and possibly admixed) (Materials and Methods and SI Appendix, Fig. S2D) and removed the possibly admixed subjects. There were 3,008 Tibetans and 373 Han retained for analysis.

We projected the PCs of our subjects on the Chinese subjects from the Human Diversity Genome Project (HGDP) (16) and illustrated the genetic relatedness between Tibetans and other ethnic groups in China (Fig. 1A). Our result suggests that Tibetans show the nearest genetic relatedness to Yi, Tu, and Naxi ethnic minority populations (Fig. 1A and SI Appendix, Table S1), consistent with these populations who reside in the neighboring regions of the TP (Yi and Naxi people are mainly distributed in Yunnan and Sichuan provinces, and most Tu people reside in Qinghai province) (Fig. 1B).

Fig. 1.

PCA of genetic ancestry in Chinese populations using genome-wide SNP data. (A) Result from a PCA in a combined sample of 3,381 genetically confirmed Tibetan and Han from this study and 180 Chinese subjects (multiple ethnic groups) from the HGDP. PC1 and PC2 represent the first two eigenvectors from PCA. Note that one of the Yi subjects from the HGDP seems to be of Tibetan ancestry. (B) Distribution of the ethnic groups in China. The blue circles represent the main distribution areas of the ethnic populations in the HGDP, and the red circle represents the Tibetan population. Note that many of the populations, such as Han, Mongola, Tibetan, and Uygur, are distributed widely in a range of regions rather than the specific areas labeled on the map. The green triangles represent the two areas (Seda and Litang) from which our Tibetan subjects were recruited (SI Appendix, Fig. S1).

We estimated the divergence time between Tibetan and Han populations using the conventional FST-based approach (17) (SI Appendix, Text S1). As described above, there were 3,008 Tibetan and 373 Han subjects collected from the TP after QC. We included in this analysis an additional set of 1,726 Han subjects collected from the Eye Hospital of Wenzhou Medical University (WZ) after QC (Materials and Methods). We used GCTA-GRM to remove cryptic relatedness in the Tibetan and Han samples (note that the Han sample was a combined set of 373 Han subjects from the TP and 1,726 Han subjects from WZ) at a relatedness threshold of 0.05 and retained 1,998 unrelated Tibetan and 2,059 unrelated Han subjects. There was no genetic difference between WZ-Han and TP-Han as shown by PCA (SI Appendix, Fig. S3), probably because most of the Han subjects, collected from either TP or WZ, were originally from diverse regions of China. The genome-wide mean FST between Tibetans and Han was 0.012 [using the method by Weir and Cockerham (18) implemented in GCTA], consistent with the estimate of the Han subjects from the HGDP (SI Appendix, Table S1). Given the genome-wide mean FST value (Materials and Methods), we estimated that the divergence time between Tibetan and Han populations was 189 generations. Assuming an average generation time of 25 y as in previous studies (3, 19), this estimate suggests that Tibetans and Han split about 4,725 y ago, ∼2,000 y earlier than that estimated from whole-exome sequencing data (3) but consistent with recent evidence from archeological studies (20, 21).

Last Edit: Mar 20, 2022 21:24:31 GMT by Admin

Admin
Administrator

Posts: 73,561

Genetic History of Tibetan Highlanders Mar 20, 2022 22:00:24 GMT

Quote

Post by Admin on Mar 20, 2022 22:00:24 GMT

Genome-Wide Analysis to Detect Genetic Signals of Adaptation.
To detect genetic signals of high-altitude adaptation, we used a mixed linear model-based leave one chromosome out association (MLMA-LOCO) analysis approach [implemented in the BOLT-LMM software tool (22)] to test for allele frequency difference between Tibetans and non-Tibetans of EAS ancestry (Materials and Methods). We investigated the statistical properties of the method using simulations (SI Appendix, Table S2). Similar approaches have been used in genome-wide association studies (GWASs) to control for population structure (22, 23). In the MLMA-LOCO model, the target SNP to be tested is fitted as a fixed effect, and all SNPs on the other chromosomes are fitted as random effects (details about the model are in Materials and Methods). The underlying assumption is that, under a drift model, the random effects follow a normal distribution with the variance being proportional to p0(1 – p0)FST, where p0 is the allele frequency in the ancestral population and FST is the Wright’s fixation index between the two derived populations (24). If there are two diverged populations in the sample, even if neither of the populations have been under natural selection, SNPs on different chromosomes will be correlated because of the systematic difference in allele frequency between populations caused by cryptic relatedness in the samples, genetic drift, and/or possibly, admixture with other populations (see below for examples). We, therefore, can correct for the interchromosome correlations by modeling all of the SNPs on the other chromosomes (as random effects, because number of SNPs is usually larger than sample size) when testing for the association of an SNP. To maximize power, we included in the analysis all of the subjects collected from the TP and WZ in China (3,008 Tibetans and 2,099 Han) and an additional set of 5,188 subjects of EAS ancestry from the Genetic Epidemiology Research on Aging (GERA) Study (25) in the United States (Materials and Methods). Because the GERA-EAS subjects were genotyped on a different SNP array (Affymetrix Axiom), we imputed all of the genotype data to 1000G reference panels using IMPUTE2 (26). There were three ancestry outliers, which were excluded from analysis (SI Appendix, Fig. S4). To exclude SNPs with allele frequency differences between cohorts caused by potential batch effects, we performed a “control–control” analysis using the MLMA-LOCO approach to test for difference in allele frequency between TP-Han and a combined set of WZ-Han and GERA-EAS and removed SNPs with P value < 1 × 10−6. We then performed a “case–control” analysis using the MLMA-LOCO approach to test for difference in allele frequency between Tibetans (“cases”: n = 3,008) and EAS subjects (“controls”: TP-Han, WZ-Han, and GERA-EAS; n = 7,287) and identified nine loci that passed the genome-wide significance level (PMLMA-LOCO < 5e-8) (Fig. 2 and SI Appendix, Fig. S5). Of nine loci, two loci, EPAS1 and EGLN1, which show the strongest signals in our analysis, are known (3–7), and the other seven loci are unique (Table 1 and SI Appendix, Fig. S6). Note that FGF10 was one of a set of genes that showed large population branch statistic (PBS) values (Tibetans vs. Han vs. Europeans) in a recent study (15). We show by linkage disequilibrium (LD) score regression analysis (SI Appendix, Text S2) that there is no inflation in the test statistic (an estimate of the regression intercept of 0.99 with an SE of 0.01, which is not significantly different from 1), suggesting that the sample structure has been well-controlled in the MLMA-LOCO analysis as expected from theory (23). We further divided the data into the Seda and Litang subsets and reran the analysis in each subset (Materials and Methods). Although all of nine loci remained highly significant, not all of them passed the genome-wide significance level in either subset (SI Appendix, Table S3). This analysis shows the gain of power for detecting genetic signals of natural selection in a dataset of large sample size. In addition, we performed conditional analyses (27, 28) at nine genome-wide significant loci and did not find evidence of multiple signals at any of these loci. We also performed the MLMA-LOCO analysis to detect signatures of genetic adaptation on the mitochondrial genome and did not observe any significant signal (SI Appendix, Fig. S7). We replicated a number of candidate gene loci as reported in previous studies (3, 4). The replication rate after correcting for multiple testing was ∼35.7% (= 5/14), much higher than expected by chance (SI Appendix, Table S4).

Fig. 2.

Genome-wide scan for genetic signatures of adaptation. Shown on the y axis are −log10 of P values from the tests of allele frequency difference between Tibetan Chinese (n = 3,008) and EASs (n = 7,287). The analysis was performed using the MLMA-LOCO method, which tests for difference in allele frequency between populations taking into account the difference caused by random drift. SNPs at the genome-wide significant loci are highlighted in red.


Table 1.
Nine genetic loci with signals of natural selection
Chromosome	SNP	bp	A1	A2　　　Frequency of A1	P value	FST	Nearest gene
　　　　　　　　　　　　　　　　　　　　　　　　　　Tibetan　EAS
1	rs1801133	11,856,378	A	G	0.238	0.333	6.3E-09	0.021	MTHFR
1	rs71673426	112,159,304	C	T	0.102	0.013	1.5E-08	0.100	RAP1A
1	rs78720557	198,096,548	A	T	0.498	0.201	4.7E-08	0.191	NEK7
1	rs78561501	231,448,497	A	G	0.599	0.135	6.1E-15	0.414	EGLN1
2	rs116611511	46,600,030	G	A	0.447	0.003	3.6E-19	0.570	EPAS1
4	rs2584462	100,324,464	G	A	0.211	0.549	3.9E-09	0.203	ADH7
5	rs4498258	44,325,322	T	A	0.586	0.287	1.7E-08	0.171	FGF10
6	rs9275281	32,662,920	G	A	0.095	0.365	1.1E-10	0.162	HLA-DQB1
12	rs139129572	123,178,478	GA	G	0.316	0.449	5.8E-09	0.036	HCAR2
P value indicates the P value from the MLMA-LOCO analysis. FST is the FST value between Tibetans and EASs. Nearest gene indicates the nearest annotated gene to the top differentiated SNP at each locus except EGLN1, which is known to be associated with high-altitude adaptation; rs139129572 is an insertion SNP with two alleles: GA and G. A1, allele 1; A2, allele 2.

Admin
Administrator

Posts: 73,561

Genetic History of Tibetan Highlanders Mar 21, 2022 19:03:17 GMT

Quote

Post by Admin on Mar 21, 2022 19:03:17 GMT

Associations of the Loci Under Natural Selection with Phenotypes in Tibetans.
Having identified nine genetic loci that have been under natural selection, we then asked whether these loci are associated with any phenotypes in Tibetans (n = 2,849). There were 91 quantitative traits measured on the Tibetan subjects (Materials and Methods), mainly morphological, blood biochemistry, and optometric measures (SI Appendix, Table S5). The phenotypic correlation matrix of these traits is shown in SI Appendix, Fig. S8. Most of the traits were highly heritable, with a substantial proportion of phenotypic variance explained by all SNPs in unrelated individuals (SI Appendix, Fig. S9 and Table S6). We then performed GWAS analysis in Tibetans using the MLMA-LOCO approach described above to control for population structure. We found that the methylenetetrahydrofolate reductase (MTHFR) and EPAS1 loci were associated with multiple traits (SI Appendix, Fig. S10), and five of these traits were significant after correcting for multiple testing (PGWAS < 1.5 × 10−4) (SI Appendix, Table S7). The MTHFR locus was strongly associated with folate (b = −0.34, PGWAS = 6.5 × 10−27) and homocysteine (b = 0.54, PGWAS = 1.1 × 10−69), where b is the effect size in SD units. This locus is known to be associated with homocysteine in Europeans (29). MTHFR is a key enzyme involved in the metabolic pathway of homocysteine and folate (30). The frequency of homocysteine-increasing allele was lower in Tibetans (0.238) than that in EAS (0.333) (Table 1), in line with homocysteine level in Tibetans (mean = 21.8, SE = 0.3) being lower than in Han (mean = 25.5, SE = 1.4), where SE represents SEM estimate. EPAS1 is known to be associated with HGB (3, 5). Our results suggest that EPAS1 is strongly associated with HGB, red blood cell count, and hematocrit (SI Appendix, Table S7) and that the HGB-decreasing allele is under very strong positive selection in Tibetans, with a frequency of 0.45 in Tibetans vs. 0.003 in EASs (Table 1). It is also interesting to note that the ADH7 locus is associated with weight and body mass index (BMI) in Tibetans (PGWAS = 7.1 × 10−4 and PGWAS = 4.9 × 10−4, respectively), although ADH7 is not a known BMI-associated locus in Europeans (31). However, the associations are not significant after correcting for multiple testing. The EGLN1 locus has been previously reported to be associated with HGB (4). We found that the association between EGLN1 and HGB was very weak (PGWAS = 0.02,, not significant after correcting for multiple testing), and the effect size was stronger in males (−0.112 in SD unit, SE = 0.046) than in females [−0.037, SE = 0.036, Pdifference = 0.01, consistent with the result from a previous study (12)].

Discussion
We have performed a large-scale genetic study in 3,008 Tibetans and 7,287 non-Tibetans of EAS ancestry. We showed the genetic relatedness between Tibetans and a number of other ethnic groups in China and found that Yi, Tu, and Naxi people are genetically intermediate between Han and Tibetans (Fig. 1A). These people are also geographically distributed between major residential areas of Han and Tibetans (Yi, Tu, and Naxi people reside at the eastern border of the TP) (Fig. 1B), suggesting potential routes of people migrating from the east to the TP. There has not been a consensus on the divergence time between Tibetans and Han (32). The estimates from different genetic studies are often inconsistent [varying from 2,750 (3) to ∼8,000 (11, 12) and ∼30,000 y B.P. (33)], even for studies using the same method [9,000–15,000 (34) vs. 20,000–40,000 y B.P. (35)]. Our estimate from, so far, the largest genetic data of Tibetans is that the divergence time between Tibetans and Han was ∼4,725 y B.P., which is consistent with the estimated permanent settlement time of ∼3,750–6,500 y B.P. from archaeological studies (20). Interestingly, a recent study (21) that investigates archaeological crop remains unearthed in the northeastern TP estimated that the first village was established 5,200 y B.P., which is highly concordant with our estimate. However, there is an important caveat in interpreting estimates from population genetics analyses. That is, if there is a constant gene flow from the founder population to the TP after initial settlement, then the estimate of divergence from a population genetic analysis will be biased downward. Therefore, our estimate should be interpreted as a lower limit of the permanent settlement time, implying that the actual settlement time of people in the TP is likely earlier than 4,725 y B.P.

We applied the MLMA-LOCO method (27) as implemented in BOLT-LMM (22) to detect genetic signals of selection. Compared with the prevailing methods (3, 5, 6), the MLMA-LOCO performs statistical tests at a genome-wide significance level, controlling for locus-specific population differentiation and potential relatedness in the sample. It is expected that the analysis using unrelated individuals was, on average, less powerful than using all of the individuals, but overall, the results are highly consistent (SI Appendix, Fig. S11). We show below an example of how the MLMA-LOCO controls for locus-specific population differentiation. There were three SNPs (on chromosomes 9, 20, and 22) that showed strong signals in linear regression (5), FST (18, 24, 36, 37), or PBS (3) analysis but did not reach the genome-wide significance level in the MLMA-LOCO analysis (SI Appendix, Fig. S12) because the three SNPs are located in regions with strong locus-specific population differentiation (SI Appendix, Fig. S13). Using the MLMA-LOCO method, we identified nine gene loci that have been under selection as a consequence of adaptation to the high altitude (Fig. 2 and SI Appendix, Fig. S5), seven of which are unique. It is noteworthy that there are surprisingly few loci that have been identified given the large sample size of this study, consistent with a model of polygenic adaptation (38). The two known loci (EPAS1 and EGLN1) showed the strongest signals in our analysis. The top signal (EPAS1) remained highly significant in the analysis of a small subset of data (150 Tibetans vs. 150 Han) (SI Appendix, Fig. S14), which explains why the EPAS1 locus can be detected in previous studies of small sample size (3, 5, 6). We further found that genetic variants at three of these loci (MTHFR, EPAS1, and ADH7) were associated with several phenotypes in Tibetans, with MTHFR being associated with folate and homocysteine levels and EPAS1 being associated with HGB and hematocrit at an experimentwise significance level (SI Appendix, Fig. S10). In addition, it was suggested in a previous study (4) that the PPARA gene locus is associated with high-altitude adaptation and HGB level in Tibetans. In our study, we found that the signal of selection at the PPARA was not genome-wide significant (PMLMA-LOCO = 9.1 × 10−5 at the top SNP rs149670586) and did not find any evidence that rs149670586 is associated with HGB (PGWAS = 0.20). There is a caveat in interpreting the MLMA-LOCO results. We found evidence of natural selection at nine gene loci by comparing the allele frequencies between Tibetans and EASs under the null hypothesis that there is no natural selection but a population differentiation caused by genetic drift and possibly, admixture with other populations. This result, however, does not necessarily mean that the selection has to relate to hypoxia. It could be adaptation to any of the extreme environmental or pathological conditions in the TP. For example, the folate-increasing allele of the SNP rs1801133 at the MTHFR locus (SI Appendix, Table S7) has an increased frequency in the Tibetan population, more than expected under a drift model (Table 1), which is possibly a consequence of adaptation to high UV radiation, because the degradation of folate could be accelerated by UV exposure (39).

In summary, we performed a large-scale genetic study in Tibetans. We showed the genetic relatedness between Tibetans and other ethnic groups in China and estimated divergence time between Tibetans and Han (4,725 y B.P.). We identified genetic signatures of high-altitude adaptation at seven gene loci. These findings provide important insight into understanding of how the Tibetan genome has changed during high-altitude adaptation.