|
Post by Admin on Jul 15, 2022 19:49:19 GMT
A Late Pleistocene human genome from Southwest China Summary Southern East Asia is the dispersal center regarding the prehistoric settlement and migrations of modern humans in Asia-Pacific regions. However, the settlement pattern and population structure of paleolithic humans in this region remain elusive, and ancient DNA can provide direct information. Here, we sequenced the genome of a Late Pleistocene hominin (MZR), dated ∼14.0 thousand years ago from Red Deer Cave located in Southwest China, which was previously reported possessing mosaic features of modern and archaic hominins. MZR is the first Late Pleistocene genome from southern East Asia. Our results indicate that MZR is a modern human who represents an early diversified lineage in East Asia. The mtDNA of MZR belongs to an extinct basal lineage of the M9 haplogroup, reflecting a rich matrilineal diversity in southern East Asia during the Late Pleistocene. Combined with the published data, we detected clear genetic stratification in ancient southern populations of East/Southeast Asia and some degree of south-versus-north divergency during the Late Pleistocene, and MZR was identified as a southern East Asian who exhibits genetic continuity to present day populations. Markedly, MZR is linked deeply to the East Asian ancestry that contributed to First Americans. Graphical abstract Introduction Both genetic and archaeological data support the early entry of anatomically modern humans (AMHs) into southern East Asia 65 to 50 thousand years ago (kya),1, 2, 3, 4, 5 and prehistoric south-to-north migration led to the current clinal structure of genetic diversity in East Asian populations.6, 7, 8 According to the unearthed archaeological sites, these earliest AMH settlers in southern East Asia (mainland Southeast Asia [MSEA] and southern China) were ancestral to the later Hòabìnhian hunter-gatherers, who flourished in the region until ∼4,000 years ago.9,10 Thus far, archaeological exploration in southern China has excavated numerous Late Paleolithic sites (∼50.0 to 11.0 kya),11, 12, 13, 14, 15, 16, 17 and the oldest 14C date (43.5 kya) associated with Hὸabìnhian pebble and flake tools currently comes from Yunnan Province of Southwest China.18 Fortunately, some of the Late Paleolithic sites contain human remains dated to ∼30.0 to 11.5 kya.11,12,14,15,17 Among them, MZR (Mengzi Ren) from Malu Dong (Red Deer Cave) (∼14.0 kya, from Yunnan Province of Southwest China) and LLR (Longlin Ren) from Laomaocao Dong (∼10.5 kya, from Guangxi Province of southern China)19 were reported to possess mosaic features of AMHs and archaic hominins based on morphological characterization.12,20, 21, 22 Neighboring to rainforest MSEA, Yunnan is characterized by its high biogeographic and species diversity with palaeoendemism.23 Yunnan hosts more than 200 fossiliferous sedimentary basins documenting the evolutionary history of biodiversity, monsoon development, and regional elevation changes24 and comprises subtropical evergreen broad-leaved coniferous forest, ranking as one of the most floristic endemic centers,25, 26, 27 as well as the richest ethnically and linguistically diverse region in China (the 7th national census of China, 2020). Malu Dong (103°24′ E, 23°20′ N) is a partially mined cave fill located in southeastern Yunnan. It was originally excavated in 1989,17 and a major sampling was carried out in 2008 by an international team.20 Approximately 30 pieces of hominin remains were unearthed in the cave, including a nearly complete cranium calotte (MLDG-1704, the specimen studied here) and a proximal femur (MLDG-1678).20,21 It is uncertain whether these hominin remains belong to the same individual. The calibrated radiocarbon dating sequence of the cave spans the intervals of 18,070–17,590 cal. yBP (calibrated years before present, 95% interval) to 13,415–13,165 cal. yBP (95% interval), and the hominin remains found in a series of deposits dated from 14,650–13,970 cal. yBP (95% interval) to 13,750–13,430 cal. yBP (95% interval).20,21 We attempted to directly date the exact MZR specimen used in this study, but unfortunately, due to the poor preservation and small quantities of samples, not enough collagen fractions were recovered for radiocarbon dating. Hence, the date of 14.0 kya for the MZR is not absolutely certain, although the dating sequence of the deposits containing hominin remains is rather narrow, supporting a date in the Late Pleistocene (>11.7 kya). Physical anthropological investigations suggest that the MZR hominin remains exhibit a combination of AMH and archaic-like traits12,20, 21, 22 (refer to the detailed morphological descriptions in the STAR Methods section). Overall, three plausible scenarios were proposed to explain the unique morphologic characteristics of MZR. First, MZR represents a late-surviving archaic hominin population even younger than the latest H. floresiensis (∼190–50 kya)28 in Asia. Second, the mosaic morphologies probably result from hybridization between AMHs and unknown archaic hominin species. Third, the unusual morphologies of MZR represent the retention of ancestral polymorphisms in Paleolithic AMHs.20, 21, 22,29 To investigate these alternative scenarios, ancient genome sequences recovered from hominin remains can serve as critical evidence in revealing the identity of MZR and the genetic diversity of the Late Pleistocene hominins in southern East Asia.
|
|
|
Post by Admin on Jul 16, 2022 17:09:38 GMT
Results Both mitochondrial and nuclear genome sequences confirm that MZR is a Late Pleistocene AMH We performed aDNA extraction and genome sequencing using the MZR cranium calotte (MLDG-1704; Figure 1). MZR is the first Late Pleistocene genome from southern East Asia (Figure 1A). Obtaining aDNA from low latitude areas in southern China is challenging due to warm and humid weather and acidic soil, which are not ideal for aDNA preservation. Additionally, among the unearthed MZR hominin remains, no ideal bone materials (such as petrous bone and teeth) were available for aDNA work, and we chose a fragment of the cranium calotte (MLDG-1704) for aDNA extraction (Figure 1B). Figure 1The Mengzi Ren (MZR) cranium in the context of the Late Pleistocene and Early Holocene sites with ancient genome data in East Asia We employed a modified version of MYbaits (human whole-genome probes) to enrich human DNA molecules, which was previously used in studying ancient samples from Southeast Asia,30 together with single-stranded library-based U-selection enrichment and standard shotgun sequencing. The uracil selection enables physical separation of the uracil-containing DNA strands from the non-deaminated strands during DNA library preparation.31 In total, we performed 28 aDNA extractions using drilled bone powders from the MZR cranium and constructed 45 DNA libraries (double-stranded and single-stranded libraries). We first constructed 28 libraries without uracil-DNA glycosylase (UDG) treatment and performed small-scale sequencing. Then, we generated 17 UDG-treated libraries (so that the high cytosine deamination damage of aDNA can be repaired) for large-scale sequencing. We checked several features of the sequencing data (non-UDG libraries) to evaluate aDNA authentication. The observed average fragmental length is 64.07 and 93.56 bp for the pre-capture and post-capture reads, respectively, which was calculated by only including the ≥35 bp pair-end merged and mapped sequences with a mapping quality of ≥25 (Figure S1A). The estimated low endogenous DNA level (0.06% on average, 0.01%–0.40%) complies with the known features of aDNA (Data S1A). We then checked the terminal damage pattern using the sequence data from the non-UDG-treated libraries; the sequencing reads showed typical high G>A and C>T substitutions at the 3' ends for the double-stranded and the single-stranded libraries, respectively. However, the expected damage pattern (a high C>T substitution) at the 5′ end is not obvious (Figures S1B and S1C), likely due to the PCR protocol used (STAR Methods). When we applied a two-round PCR protocol that employed the high-fidelity polymerase in the second-round PCR, we saw the expected damage patterns at both ends (Figure S1D). Together, these results support aDNA authentication of MZR. Among the 28 non-UDG libraries, the estimated rates of terminal damage ranged from 5.33% to 49.30%, with only three libraries having <10% rates. The estimated modern DNA contamination rates are 0.72% for nuclear DNA and 5.88% for mitochondrial DNA (mtDNA) (Data S1A and S1B), together indicating low-level modern DNA contamination. Given the validated aDNA authentication, we used the 17 UDG-treated libraries to conduct large-scale genome sequencing. We adopted stringent filtering of the clean reads so that only the high-quality sequences were retained. We first merged the forward and reverse read pairs to recover full-length sequences, and in total we obtained approximately 1.9 billion clean merged reads from the paired-end sequence data. We mapped them to the human reference genome hs37d5. We then trimmed the mapped full-length reads based on their terminal damage patterns (Figure S1E) (see STAR Methods section) and remapped the trimmed reads to the reference genome. The sequences with low mapping qualities (<25) were discarded. We used the filtered and merged bam file to perform genotype calling. Initially, we recovered 100.97 million base pairs (∼0.113× coverage of the nuclear genome) using snpAD.32 We also generated the “pseudohaploid” genotype using the pileupCaller program in sequenceTools (https://anaconda.org/bioconda/sequenceTools). Within the 1,240K SNP sets, we found that the genotyping results by snpAD and pileupCaller were highly consistent with each other (99.2% overlap) (Figures S1F and S1G). To further check the possible impact of artificial C to T substitutions caused by deamination damage, we performed principal component analyses (PCAs) among MZR and five modern Asian populations from the 1000 Genome Project (1KG) (http://www.1000genomes.org) using either all SNPs (2,727,839) or transversion-only SNPs (876,456). We observed the same clustering pattern in the PCA maps derived from the two SNP sets (Figure S2). Additionally, similar levels of genetic affinity of MZR with modern East Asians were detected by f3 statistics33 (Data S1C). Thus, the impact of deamination-induced damage is negligible in the MZR genome data. MZR was identified as a female based on the mapping ratio between the Y chromosome and autosomes (NY/Nauto = 0.0026). Fortunately, due to the high copy numbers of mitochondria in the cell, we were able to obtain on average a 125.05× sequencing depth of the mitochondrial genome and recovered 95.84% (15,880/16,569) of the mtDNA genomic sites (Data S1A and S1D). Clearly, the MZR mtDNA belongs to the AMH lineage and is assigned to a basal macro-branch of the M9 haplogroup in the global human mtDNA tree (PhyloTreemt, Build 17) (http://www.phylotree.org/) (Figure 2A ). The M9 haplogroup comprises two macro-branches (M9a’b and E) in current human populations, and the MZR mtDNA represents the third macro-branch containing one private mutation (T16304C), which is extinct in current human populations (Figure 2A; Data S1D). Macro-branch M9a’b is currently distributed in mainland East Asia with a southern origin and northward expansion at approximately 18 to 28 kya.34 In contrast, macro-branch E is mainly distributed in island Southeast Asia (ISEA) and the Solomon Islands of Melanesia, reflecting the Neolithic expansion of Austronesian speakers35 (Figure 2A; Data S1E). Hence, the discovery of an extinct basal M9 lineage for the MZR suggests a rich matrilineal diversity of human populations in southern East Asia during the late Pleistocene.
|
|
|
Post by Admin on Jul 16, 2022 18:22:22 GMT
Figure 2 Genetic identity of MZR and her affinity with other ancient East Asians Consistently, the nuclear genome sequences also support MZR as an AMH. In the PCA map covering both modern and ancient samples from East Asia, the MZR falls in the variation range of modern humans and is close to southern East Asians (Figures 2B and S2). The southern East Asian affiliation of MZR contrasts with the known affinity of Tianyuan (40.0 kya)37 with northern East Asians (Figure 2B; Data S1F). Consistently, the TreeMix38 analysis indicates that among the major East Asia aDNA samples (40.0–7.0 kya), MZR clusters with Early Neolithic coastal southern East Asians (sEastAsia_EN, including Qihe3 [11.5 kya], Liangdao2 [7.5 kya], Baojianshan5 [7.4 kya], and Dushan4 [8.7 kya]),19,39 and they form the southern clade, clearly separated from the northern clade covering Early Neolithic coastal northern East Asians (nEastAsia_EN, including Bianbian [9.5 kya], Boshan [8.2 kya], and Xiaogao [8.6 kya]), Yumin (8.4 kya), DevilsCave (7.6 kya), and Amur (19.0 kya) (Figures 2C and S3). Therefore, the genome of MZR shows an affinity with southern East Asians of similar dates and implies that some degree of genetic divergence between southern and northern East Asians was likely present during the Late Pleistocene, though a low gene flow from MZR to nEastAsian_EN was detected (Figure 2C). This observation is also supported by the result of the D (MZR, UKY; ancient East Eurasia [≥7.0 kya], Mbuti) test, where a clinal south-to-north divergence in East Asia is detected. MZR shares more alleles with ancient southern East Asians (such as Liangdao2), while UKY is relatively close to northern East Asians (Figure 3A ; Data S1G). In addition, all the ancient samples (14.0–7.5 kya) in southern East Asia exhibit negative correlations with latitude when compared to modern Chinese, while most of the ancient samples (40.0–7.6 kya) in northern East Asia show no correlation, and two samples (Yumin and DevilsCave) show positive correlations with latitude (Figure 3B; Data S1H). This pattern is consistent with the above view that some degree of genetic divergence between southern and northern East Asians was present during the Late Pleistocene. Of note, the clustering pattern of LLR and Ikawazu (the 2.6 kya Jomon individuals from Japan) is slightly different from the previous report,19 possibly due to the high missing rates (69.14% and 18.09%, respectively) in the merged genome dataset of these two samples. Figure 3 The Late Pleistocene south-versus-north population divergence in East Asia Given the AMH identity of MZR, to evaluate the level of introgression from archaic humans, using African (Mbuti) as the control group (assuming no introgression), we performed an f4-ratio test with ADMIXTOOLS.33 The estimated introgression levels from archaic humans in the MZR genome are ∼1.29% for Denisovan ancestries and ∼1.27% for Neanderthal ancestries, similar to the reported introgression levels in current East Asians (1.20%–2.90% on average)37 (Data S1I). In addition, taking chimpanzees as the outgroup, we conducted the D (East Asian, MZR; Denisovan/Neanderthal, chimpanzee) test to compare the introgression-ratio difference between the MZR and modern East Asians, and no difference was detected (Data S1J). Consistently, LLR was also reported as an AMH with similar low introgression levels from archaic humans.19 Thus, these results tend to refute the proposed scenarios of either surviving archaic hominins or hybridization between AMHs and unknown archaic hominins for MZR and LLR. Rather, the genome features of these two samples with “unusual” morphologies may reflect a rich diversity of Paleolithic AMHs living in southern East Asia. It should be noted that due to the low coverage of the sequenced MZR genome, we cannot completely rule out the possible existence of archaic alleles in the MZR genome introgressed from Neanderthal/Denisovan or unknown archaic hominins that may contribute to the morphological features of MZR. Collectively, both mtDNA and nuclear genome sequences demonstrate that MZR is an AMH. Her mtDNA represents an extinct basal lineage, and her nuclear genome harbors deeply diverged Asian AMH ancestries, reflecting a rich diversity of ancient populations during the Late Pleistocene in southern East Asia.
|
|
|
Post by Admin on Jul 16, 2022 21:48:08 GMT
Inferring population history of East Asians during Late Pleistocene based on the genomic data of MZR and other ancient samples Decoding aDNA of geographically diverse human remains is highly informative in understanding population history. Compared to the systematic aDNA dissections in West Eurasia,41 aDNA studies in East Asia are still limited.19,30,39,42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 The reported Late Pleistocene studies of aDNA human genomes covered only northern China,37,49 and MZR is the first Late Pleistocene genome from southern East Asia. By integrating the genomes of current global populations and the published ancient genomes, we conducted a detailed genome structure analysis using the ADMIXTURE tools54 (Figures 4A and S4). Consonant with the PCA and TreeMix results, the major genetic component in MZR belongs to southern East Asians (light green in Figure 4A ), and so does Qihe3 (11.5 kya) from Fujian.19 In contrast, Amur (19.0 kya) and Bianbian (9.5 kya) majorly possess a northern East Asian component, supporting the proposed south-north divergence of East Asians during the Late Pleistocene and also in line with a recent report.49 Of note, all early post-LGM (Last Glacial Maximum) Paleo-Siberians contain appreciable proportions of the Native American component (red), confirming Siberia as the outpost of the earliest migration to America (Figures 4A and S4A). Figure 4 We next performed an f3 analysis33 to reveal the Late Pleistocene population relationship and MZR’s connection to Early Holocene and current East Asian populations. At least 20,000 SNPs were included when conducting pairwise comparisons, and the African population (Mbuti) was used as an outgroup. In line with the above results (Figures 2B, 2C, and 4A), the f3 (modern East Asians, MZR; Mbuti) test suggests that among the modern samples, MZR is closer to southern Chinese than to northern Chinese, while among the ancient samples, this contrast is less obvious though the highest f3 value still occurs in southern populations (Gongguan, 3.2 kya)48 (Figure 4B; Data S1K). Notably, although the geographic location of MZR is close to Southeast Asia, MZR shows significantly less affinity to both modern and ancient Southeast Asians (|Z| > 7.0; Data S1K and S1L), an indication of already structured and diversified ancient populations in southern East Asia, consistent with the mtDNA data (Figure 2A; Data S1D). In addition to the earliest southern settlement of AMHs in East Asia, ancient migration (40–18 kya) into East Asia via the “Northern Route” from West Eurasia was previously proposed. The “Northern Route” hypothesis would also explain where the subtle shared ancient north Eurasian (ANE) ancestry came from that is then also shared with Native Americans. In addition, the “Northern Route” may also contribute to the south-versus-north divergence of East Asians. This is supported by both archaeological and genetic evidence,55, 56, 57, 58 although the contribution of the “Northern Route” to current East Asians is relatively minor (6.78%) based on the Y chromosome data.56,59 To test the source of the “Northern Route,” we calculated population divergence levels using pairwise Fst, and we found that Central Asians and Siberians are the best candidates who show the lowest Fst values compared to East Asians, especially to Altaic speakers in northern China (Figure S5; Data S1M). The result is expected considering the geographic proximity of Central Asia and Siberia to northern China, and it is also in line with the inferred migratory route by the reported archaeological and genetic evidence.55,56 Lastly, the time series aDNA data can be used to track the emergence and spreading pattern of adaptive sequence variants. By utilizing the published aDNA data, we reconstructed the spatial-temporal distribution of an East Asian-specific variant (OCA2-His615Arg) that contributes to skin lightening due to local Darwinian positive selection (Figure 5, left panel).60, 61, 62, 63 It turned out that all the Late Pleistocene individuals (e.g., MZR, Tianyuan, Amur-33K, Amur-19K, and UKY) lack the derived allele (OCA2-615Arg). The first presence of the adaptive allele (OCA2-615Arg) was in Liangdao 2–7.5 kya from coastal southern China,39 and it quickly elevated to medium frequency (25.67%, 29/113), mainly in coastal East Asia, and then spread to northern East Asia ∼3,500 years ago, and finally became dominant (∼60.00%) in current East Asians (Figure 5, left panel; Data S1N). This pattern suggests that the selective event in East Asians likely occurred in the Late Holocene epoch, coinciding the proposed quasi-exponential population growth during that time.64, 65, 66, 67, 68 However, another East Asian-specific variant (EDAR-V370A)69 exhibits a distinct pattern (Figure 5, right panel; Data S1O). Figure 5 Tracing the complex migratory histories of AMHs to the Americas We applied the outgroup f3 (global Late Pleistocene/Early Holocene populations, MZR; Mbuti) test to determine the genetic affinity of MZR to global populations (Figure 6A ; Data S1P–S1R). Among the Late Pleistocene samples (45.0–11.7 kya), MZR exhibits the closest affinity with the Paleo-Siberian UKY (13.9 kya) (f3 = 0.2839) (closely related to First Americans40) and First Americans (maximum f3 = 0.2792), even closer than to Tianyuan (0.2445) and the Amur samples (Amur-33.0kya [0.2409] and Amur-19.0K [0.2792]). The D tests also indicate that MZR/UKY and MZR/Amur-19.0K are cladal with respect to First Americans (Data S1Q), suggesting the East Asian contribution to Native Americans likely originated prior to the south-versus-north East Asian divergence. Moreover, we performed the D (MZR, X; First Americans, Mbuti) tests to compare the affinity of First Americans with MZR, the ancient coastal East Asians, and Paleo-Siberians (Figures 6B and S6A–S6D; Data S1Q), and we observed that First Americans (USR1, 11.4 kya; Spirit Cave, 11.0 kya; Los Rieles, 11.9 kya; Sumidouro, 10.4 kya) all exhibited higher affinity with MZR than with the late Hὸabìnhian populations from Southeast Asia, the Jomon population (hunter-fishers) from Japan, and the pre-LGM Late Pleistocene UstIshim (45.0 kya), Tianyuan (40.0 kya), Salkhit (34.0 kya), Sunghir (33 kya), YanaOld (32.0 kya), and MA-1 (24.3 kya). Hence, by connecting with the Early Holocene coastal East Asians (Liangdao2) (Figure 3A) and nEastAsians_EN (Figures 2C and S3), MZR is linked deeply and indirectly to the East Asian ancestry that contributed to First Americans. Consistently, the post-LGM samples (Amur-14.1K, Amur-6.3K, KolymaRiver, DevilsCave, Shamanka, and UstBelaya) from the Amur region and the Far East region of Siberia all show a close affinity to First Americans (the |Z| scores of D [MZR, the listed post-LGM samples; First Americans, Mbuti] test ≈3 or >3) (Data S1Q), supporting the proposed migratory route by way of the Far East region of Siberia, and from there First Americans crossed the Bering Strait.
|
|
|
Post by Admin on Jul 17, 2022 17:37:46 GMT
Figure 6 To further test the contribution of the Late Pleistocene East Asians to the earliest Native Americans, we checked the spatial-temporal distribution of an East Asian-specific variant (EDAR-V370A) with an estimated occurrence of 30 kya based on genetic data of current populations.69 Among the ancient samples, the earliest presence of this variant was in Amur-19.0K from northern East Asia,49 followed by UKY-13.9kya, Amur-14.5K, and Amur-14.1K during the Late Pleistocene (Figure 5E; Data S1O). In America, the earliest presence was in LosRieles (12.0 kya) from coastal Chile of South America. Subsequently, EDAR-V370A was elevated to extremely high frequencies in both East Asia (89.41%, 76/85) and America (93.33%, 28/30) during the Early Holocene (11.6–5.0 kya) (Figure 5F) and continued to maintain high frequency in the Late Holocene (Figure 5G) and modern time (Figure 5H). Hence, the spatial-temporal distribution of EDAR-V370A supports a clear contribution of the Late Pleistocene East Asians to the earliest peopling of America. Consistently, for the OCA2-His615Arg variant, which occurred in the Late Holocene, we did not see its presence in either ancient or modern Native Americans (Figures 5A–5D). When all pre-Colombian Native Americans (≥500 years ago) were included in the f3 analysis, we did not observe obvious geographic bias of affinity with MZR, although northern Native Americans are slightly closer to MZR than southern Native Americans but statistically not significant (|Z| = 0.724) (Figure S6E; Data S1S). This is in line with the proposed bottleneck effect leading to genetic homogeneity of Native Americans,70 as well as the more infrequent population movement and replacement in the Americas than in Eurasia and Africa.71
|
|