Post by Admin on Mar 12, 2020 21:54:56 GMT
The Divergence of Neandertal and Modern Human Y Chromosomes
Fernando L. Mendez
G. David Poznik
Sergi Castellano
Carlos D. Bustamante
Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidrón, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup.
We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ∼2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.
Introduction
A central goal of human population genetics and paleoanthropology is to elucidate the relationships among ancient populations. Before the emergence of anatomically modern humans in the Middle Pleistocene ∼200 thousand years ago (kya),1 archaic humans lived across Africa, Europe, and Asia in highly differentiated populations. Modern human populations that expanded out of Africa in the Upper Pleistocene received a modest genetic contribution from at least two archaic hominin groups, the Neandertals and Denisovans.2, 3, 4, 5 Especially in light of hypothesized genetic incompatibilities between Neandertals and modern humans,6 it is important to characterize differentiation between their ancestral populations and to investigate potential barriers to gene flow.
When populations diverge from one another, each retains a subset of the variation that existed in the ancestral population. Consequently, sequence divergence times usually exceed population divergence times, and this effect is more pronounced when the ancestral effective population size was large. In humans, a large fraction of genetic diversity is due to ancient polymorphisms that arose long before the emergence of anatomically modern traits. As a result, Neandertal and modern haplotypes are often no more diverged than modern human sequences are among themselves.2 This fact complicates the search for introgressed genomic segments, but two features facilitate their detection.6, 7 First, due to low levels of polymorphism among Neandertals,5 introgressed sequences are often quite similar to those of the Neandertal reference. Second, these regions have elevated linkage disequilibrium due to the relatively recent date of admixture, ∼50 kya.8, 9, 10 Although introgressed Neandertal sequences have been identified in modern human autosomes and X chromosomes, no mitochondrial genome (mtDNA) sequences of Neandertal origin have been reported in modern humans, and Neandertal Y-chromosome sequences have not yet been characterized.
Because uniparentally inherited loci have much smaller effective population sizes than autosomal or X-linked loci, the expected differences between sequence and population divergence times are smaller. Therefore, studying these loci can help to delineate an upper bound for the time at which populations last exchanged genetic material. To date, five Neandertal individuals have been whole-genome sequenced to 0.1× coverage or higher,2, 5 but all were female. Full mtDNA sequences are also available for eight individuals from Spain, Germany, Croatia, and Russia,11, 12 but the relationship between Neandertal and modern human Y chromosomes remains unknown.
In this work, we analyzed ∼120 kb of exome-captured Y-chromosome sequence from an ∼49,000-year-old (uncalibrated 14C)13 Neandertal male from El Sidrón, Spain.14 We compare it to the human and chimpanzee reference sequences and to the sequences of two Mbo individuals15 who carry the A00 haplogroup, the most deeply branching group known.16 We identify the relationship between the Neandertal and modern human Y chromosomes and estimate the time to their most recent common ancestor (TMRCA). We also examine coding differences and explore their potential significance for reproductive isolation.
Material and Methods
Sequence Data and Processing
We used the Y-chromosome sequences from the exome capture of a Neandertal from El Sidrón, Spain,14 and we downloaded the complete sequences of two A00 Y chromosomes.15 The Neandertal data included coding, non-coding, and off-target sequences, and all three sequences were mapped against the GRCh37 reference.14 Given that the A00 sequences were closely related,15, 16 we merged them to increase coverage. We called bases for both the Neandertal and A00 sequences by using SAMtools mpileup (v.1.1),17 specifying input options to count anomalous read pairs (-A), recalculate base qualities (-E), and filter out poor-quality bases (-Q 17) and poorly mapping reads (-q 20).
We then identified overlapping regions and excluded coordinates with unusually high coverage, filtering out sites with coverage greater than the mean plus five times its square root (Figure S1). Under a Poisson model, this cutoff would elicit the loss of less than one genuine site per 10,000. Finally, we removed sites with inconsistent base calls, discarding those with more than two reads differing from the consensus allele and those for which more than one third of the observed bases did not match the consensus. This filter should minimize the effects of postmortem DNA damage and of modern contamination.
Figure 1Tree Inference
Using the blastz file chrY.hg19.panTro4.net.axt.gz,18 we identified the subset of regions within which the human sequences align to the chimpanzee reference. This yielded a total of 118,643 base pairs (bp). In what follows, we refer to this set of sites as “filter 1.” We also identified a second, more restrictive, set of regions totaling 100,324 bp, “filter 2,” by further requiring that the alignment correspond to the chimpanzee Y chromosome rather than to another chimpanzee chromosome (Tables S1A and S1B).
For each position within these regions, we determined whether the Neandertal, A00, or both differed from the human reference sequence. We then used the corresponding chimpanzee allele as a proxy for the ancestral state in order to assign the mutation to the appropriate branch of the tree relating the four sequences (Figure 1A). In doing so, we discarded five sites: two at which the chimpanzee carries a third allele, one for which the chimpanzee carries a deletion, and two that were specific to A00 but only supported by a single read. Excluding these sites had little impact on our analyses.
To estimate the TMRCA of the Neandertal and modern human Y chromsomes (TNR), we decomposed this quantity (Figure 2) into the sum of the TMRCA of modern humans (TAR) and the time separating the most recent common ancestor of modern humans from its common ancestor with the Neandertal lineage (TNM):
TNR=TAR+TNM=αTARα≡(1+TNMTAR).
We then estimated TAR and used two methods to estimate α.
Fernando L. Mendez
G. David Poznik
Sergi Castellano
Carlos D. Bustamante
Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidrón, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup.
We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ∼2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.
Introduction
A central goal of human population genetics and paleoanthropology is to elucidate the relationships among ancient populations. Before the emergence of anatomically modern humans in the Middle Pleistocene ∼200 thousand years ago (kya),1 archaic humans lived across Africa, Europe, and Asia in highly differentiated populations. Modern human populations that expanded out of Africa in the Upper Pleistocene received a modest genetic contribution from at least two archaic hominin groups, the Neandertals and Denisovans.2, 3, 4, 5 Especially in light of hypothesized genetic incompatibilities between Neandertals and modern humans,6 it is important to characterize differentiation between their ancestral populations and to investigate potential barriers to gene flow.
When populations diverge from one another, each retains a subset of the variation that existed in the ancestral population. Consequently, sequence divergence times usually exceed population divergence times, and this effect is more pronounced when the ancestral effective population size was large. In humans, a large fraction of genetic diversity is due to ancient polymorphisms that arose long before the emergence of anatomically modern traits. As a result, Neandertal and modern haplotypes are often no more diverged than modern human sequences are among themselves.2 This fact complicates the search for introgressed genomic segments, but two features facilitate their detection.6, 7 First, due to low levels of polymorphism among Neandertals,5 introgressed sequences are often quite similar to those of the Neandertal reference. Second, these regions have elevated linkage disequilibrium due to the relatively recent date of admixture, ∼50 kya.8, 9, 10 Although introgressed Neandertal sequences have been identified in modern human autosomes and X chromosomes, no mitochondrial genome (mtDNA) sequences of Neandertal origin have been reported in modern humans, and Neandertal Y-chromosome sequences have not yet been characterized.
Because uniparentally inherited loci have much smaller effective population sizes than autosomal or X-linked loci, the expected differences between sequence and population divergence times are smaller. Therefore, studying these loci can help to delineate an upper bound for the time at which populations last exchanged genetic material. To date, five Neandertal individuals have been whole-genome sequenced to 0.1× coverage or higher,2, 5 but all were female. Full mtDNA sequences are also available for eight individuals from Spain, Germany, Croatia, and Russia,11, 12 but the relationship between Neandertal and modern human Y chromosomes remains unknown.
In this work, we analyzed ∼120 kb of exome-captured Y-chromosome sequence from an ∼49,000-year-old (uncalibrated 14C)13 Neandertal male from El Sidrón, Spain.14 We compare it to the human and chimpanzee reference sequences and to the sequences of two Mbo individuals15 who carry the A00 haplogroup, the most deeply branching group known.16 We identify the relationship between the Neandertal and modern human Y chromosomes and estimate the time to their most recent common ancestor (TMRCA). We also examine coding differences and explore their potential significance for reproductive isolation.
Material and Methods
Sequence Data and Processing
We used the Y-chromosome sequences from the exome capture of a Neandertal from El Sidrón, Spain,14 and we downloaded the complete sequences of two A00 Y chromosomes.15 The Neandertal data included coding, non-coding, and off-target sequences, and all three sequences were mapped against the GRCh37 reference.14 Given that the A00 sequences were closely related,15, 16 we merged them to increase coverage. We called bases for both the Neandertal and A00 sequences by using SAMtools mpileup (v.1.1),17 specifying input options to count anomalous read pairs (-A), recalculate base qualities (-E), and filter out poor-quality bases (-Q 17) and poorly mapping reads (-q 20).
We then identified overlapping regions and excluded coordinates with unusually high coverage, filtering out sites with coverage greater than the mean plus five times its square root (Figure S1). Under a Poisson model, this cutoff would elicit the loss of less than one genuine site per 10,000. Finally, we removed sites with inconsistent base calls, discarding those with more than two reads differing from the consensus allele and those for which more than one third of the observed bases did not match the consensus. This filter should minimize the effects of postmortem DNA damage and of modern contamination.
Figure 1Tree Inference
Using the blastz file chrY.hg19.panTro4.net.axt.gz,18 we identified the subset of regions within which the human sequences align to the chimpanzee reference. This yielded a total of 118,643 base pairs (bp). In what follows, we refer to this set of sites as “filter 1.” We also identified a second, more restrictive, set of regions totaling 100,324 bp, “filter 2,” by further requiring that the alignment correspond to the chimpanzee Y chromosome rather than to another chimpanzee chromosome (Tables S1A and S1B).
For each position within these regions, we determined whether the Neandertal, A00, or both differed from the human reference sequence. We then used the corresponding chimpanzee allele as a proxy for the ancestral state in order to assign the mutation to the appropriate branch of the tree relating the four sequences (Figure 1A). In doing so, we discarded five sites: two at which the chimpanzee carries a third allele, one for which the chimpanzee carries a deletion, and two that were specific to A00 but only supported by a single read. Excluding these sites had little impact on our analyses.
To estimate the TMRCA of the Neandertal and modern human Y chromsomes (TNR), we decomposed this quantity (Figure 2) into the sum of the TMRCA of modern humans (TAR) and the time separating the most recent common ancestor of modern humans from its common ancestor with the Neandertal lineage (TNM):
TNR=TAR+TNM=αTARα≡(1+TNMTAR).
We then estimated TAR and used two methods to estimate α.