Post by Admin on Apr 26, 2022 18:02:07 GMT
Methodological Considerations.
Because of the poor read quality and low sequencing depth for ancient samples, analysis of ancient DNA has primarily made use of haploid genomes in which the haplotype phase has been lost. However, the augmentation of ancient samples with modern reference genomes is increasingly making it possible to perform genotype imputation and haplotype phasing in ancient samples (53). Previous studies have used imputed diploid genotypes from ancient individuals to study demographic history and estimate phenotypes in ancient individuals (54–58). Our work is one of relatively few studies that use imputed genotypes in ancient samples to evaluate haplotype sharing within and between ancient and present-day individuals (55–57).
In this study, we encountered a scenario in which a modern population of interest to examine for genetic continuity with ancient populations possesses admixture components that are not informative about the relationships of interest. Such scenarios can be addressed by performing analyses that disregard those admixture components. In our scenario, we sought to discern, within the component of genomic membership not assigned to European admixture, relative contributions of clusters associated with different Indigenous populations (Fig. 7). The signature of similarity between present-day Muwekma Ohlone and a cluster with considerable membership in the ancient San Francisco Bay Area samples and smaller signatures of other modern populations with this cluster suggests the potential of the approach in other comparisons of ancient populations to modern admixed populations.
Many ancient DNA studies in the Americas, and particularly those involving individuals from North America, have studied large-scale processes such as the initial peopling of the continents (19, 21, 22, 24) or subsequent major migration events (15, 16). As a result, enough ancient individuals have been sequenced to provide reference data for studies that focus on ancient genomics of a specific region, such as the Pacific Northwest (18) or the Caribbean (59, 60). Our study of ancient and present-day individuals from the San Francisco Bay Area contributes an example of the use of regionally focused ancient genomics to demonstrate how analysis of ancient and modern individuals can reveal changes in local population structure over time.
An important component of this study has been its community engagement process and coproduction of knowledge as part of increasing interest in partnerships between researchers and Indigenous communities to conduct genetic research (34, 36, 61)—including genetic research that involves Indigenous ancestors (35, 62). A distinctive feature in this case has been the participation of a tribal group in the initiative to pursue the project, in the selection of research questions, in archaeological excavation and ancient genomics involving sites in their historical lands, and in present-day genomic analysis with current tribal members. Hence, in addition to its scientific conclusions, the study provides a contribution to advancing community engagement models in Indigenous genomics. The study reaffirms the Muwekma Ohlone’s deep-time ties to the area, providing evidence that disagrees with linguistic and archaeological reconstructions positing that the Ohlone are late migrants to the region (37, 38). The results have also generated interest from tribal leadership in carrying out similar genomic investigations on ancestral remains from older sites in order to better document and understand the time depth of Ohlone population-genetic continuity in the San Francisco Bay region.
Materials and Methods
Ethics Approvals.
The study proceeded with significant community engagement at all stages (Community Engagement), under Institutional Review Board protocol no. 10538 from the University of Illinois at Urbana–Champaign, and it included informed consent from present-day members of the Muwekma Ohlone tribe. In addition, the Muwekma Ohlone Tribal Council also approved the study, including the genomic analysis of community members and ancestral remains. The Tribal Council was consulted on the results and approved the manuscript for disseminating the study.
Principal Components Analysis.
We performed PCA with both the full set of 311 and the subset of 165 individuals, employing all 474,317 SNPs. For both datasets, we first estimated the covariance matrix of individual genotype vectors from genotype likelihoods (SI Appendix, Methods). We then used the eigen function in R to calculate eigenvectors, corresponding to principal components, and eigenvalues.
Model-Based Clustering.
We used NGSadmix (63) to perform unsupervised model-based clustering on genotype likelihoods from the 85,659 SNPs that remained after LD pruning. For each tested number of clusters K, we performed the clustering 10 independent times, running NGSadmix with parameters -minMaf 0.05, -maxiter 10,000, and -tol 0.000001. We also included the parameter -minInd 35 for the full dataset of 311 individuals and -minInd 15 for the subset of 165 individuals. To evaluate the clustering solutions inferred by NGSadmix, we ran CLUMPP (64) with parameters DATATYPE 0, M 2, W 0, S 2, and GREEDY_OPTION 2, and REPEATS 1000. Next, following Verdu et al. (65), we clustered the runs based on pairwise G′ values greater than 0.9. For the majority cluster of each K value, which contained the most runs, we reran CLUMPP with the same parameters to produce an averaged clustering solution for display in figures. Preferred choices for the value of K were obtained by use of evalAdmix (ref. 66; SI Appendix, Methods and Fig. S4).
IBS Segment Sharing.
We identified IBS segments between pairs of samples in four steps (SI Appendix, Fig. S5). First, we estimated genotype likelihoods in the ancient and modern samples with ANGSD (67). Second, we phased and imputed genotypes from the genotype likelihoods with GLIMPSE (68). Third, we called IBS segments from the phased genotypes with hap-IBD (69). Fourth, in modern admixed individuals, we performed local ancestry assignment and identified IBS segments that lie on the Indigenous background, considering comparisons between modern samples and other modern samples, and between modern samples and ancient samples. This pipeline generated a list of IBS segments shared between ancient and modern individuals, restricting attention to the Indigenous-origin segments of the modern genomes. Further details appear in the SI Appendix, Methods and Fig. S6.
Supporting Information
The following supplementary materials are available online at 10.1073/pnas.2111533119.
Because of the poor read quality and low sequencing depth for ancient samples, analysis of ancient DNA has primarily made use of haploid genomes in which the haplotype phase has been lost. However, the augmentation of ancient samples with modern reference genomes is increasingly making it possible to perform genotype imputation and haplotype phasing in ancient samples (53). Previous studies have used imputed diploid genotypes from ancient individuals to study demographic history and estimate phenotypes in ancient individuals (54–58). Our work is one of relatively few studies that use imputed genotypes in ancient samples to evaluate haplotype sharing within and between ancient and present-day individuals (55–57).
In this study, we encountered a scenario in which a modern population of interest to examine for genetic continuity with ancient populations possesses admixture components that are not informative about the relationships of interest. Such scenarios can be addressed by performing analyses that disregard those admixture components. In our scenario, we sought to discern, within the component of genomic membership not assigned to European admixture, relative contributions of clusters associated with different Indigenous populations (Fig. 7). The signature of similarity between present-day Muwekma Ohlone and a cluster with considerable membership in the ancient San Francisco Bay Area samples and smaller signatures of other modern populations with this cluster suggests the potential of the approach in other comparisons of ancient populations to modern admixed populations.
Many ancient DNA studies in the Americas, and particularly those involving individuals from North America, have studied large-scale processes such as the initial peopling of the continents (19, 21, 22, 24) or subsequent major migration events (15, 16). As a result, enough ancient individuals have been sequenced to provide reference data for studies that focus on ancient genomics of a specific region, such as the Pacific Northwest (18) or the Caribbean (59, 60). Our study of ancient and present-day individuals from the San Francisco Bay Area contributes an example of the use of regionally focused ancient genomics to demonstrate how analysis of ancient and modern individuals can reveal changes in local population structure over time.
An important component of this study has been its community engagement process and coproduction of knowledge as part of increasing interest in partnerships between researchers and Indigenous communities to conduct genetic research (34, 36, 61)—including genetic research that involves Indigenous ancestors (35, 62). A distinctive feature in this case has been the participation of a tribal group in the initiative to pursue the project, in the selection of research questions, in archaeological excavation and ancient genomics involving sites in their historical lands, and in present-day genomic analysis with current tribal members. Hence, in addition to its scientific conclusions, the study provides a contribution to advancing community engagement models in Indigenous genomics. The study reaffirms the Muwekma Ohlone’s deep-time ties to the area, providing evidence that disagrees with linguistic and archaeological reconstructions positing that the Ohlone are late migrants to the region (37, 38). The results have also generated interest from tribal leadership in carrying out similar genomic investigations on ancestral remains from older sites in order to better document and understand the time depth of Ohlone population-genetic continuity in the San Francisco Bay region.
Materials and Methods
Ethics Approvals.
The study proceeded with significant community engagement at all stages (Community Engagement), under Institutional Review Board protocol no. 10538 from the University of Illinois at Urbana–Champaign, and it included informed consent from present-day members of the Muwekma Ohlone tribe. In addition, the Muwekma Ohlone Tribal Council also approved the study, including the genomic analysis of community members and ancestral remains. The Tribal Council was consulted on the results and approved the manuscript for disseminating the study.
Principal Components Analysis.
We performed PCA with both the full set of 311 and the subset of 165 individuals, employing all 474,317 SNPs. For both datasets, we first estimated the covariance matrix of individual genotype vectors from genotype likelihoods (SI Appendix, Methods). We then used the eigen function in R to calculate eigenvectors, corresponding to principal components, and eigenvalues.
Model-Based Clustering.
We used NGSadmix (63) to perform unsupervised model-based clustering on genotype likelihoods from the 85,659 SNPs that remained after LD pruning. For each tested number of clusters K, we performed the clustering 10 independent times, running NGSadmix with parameters -minMaf 0.05, -maxiter 10,000, and -tol 0.000001. We also included the parameter -minInd 35 for the full dataset of 311 individuals and -minInd 15 for the subset of 165 individuals. To evaluate the clustering solutions inferred by NGSadmix, we ran CLUMPP (64) with parameters DATATYPE 0, M 2, W 0, S 2, and GREEDY_OPTION 2, and REPEATS 1000. Next, following Verdu et al. (65), we clustered the runs based on pairwise G′ values greater than 0.9. For the majority cluster of each K value, which contained the most runs, we reran CLUMPP with the same parameters to produce an averaged clustering solution for display in figures. Preferred choices for the value of K were obtained by use of evalAdmix (ref. 66; SI Appendix, Methods and Fig. S4).
IBS Segment Sharing.
We identified IBS segments between pairs of samples in four steps (SI Appendix, Fig. S5). First, we estimated genotype likelihoods in the ancient and modern samples with ANGSD (67). Second, we phased and imputed genotypes from the genotype likelihoods with GLIMPSE (68). Third, we called IBS segments from the phased genotypes with hap-IBD (69). Fourth, in modern admixed individuals, we performed local ancestry assignment and identified IBS segments that lie on the Indigenous background, considering comparisons between modern samples and other modern samples, and between modern samples and ancient samples. This pipeline generated a list of IBS segments shared between ancient and modern individuals, restricting attention to the Indigenous-origin segments of the modern genomes. Further details appear in the SI Appendix, Methods and Fig. S6.
Supporting Information
The following supplementary materials are available online at 10.1073/pnas.2111533119.