|
Post by Admin on Jan 5, 2022 21:28:32 GMT
S3 - Adding archaic data and ancestral information The combined primary dataset of 471 modern human genomes was merged with two high-coverage archaic hominin genomes: a) Denisovan (Meyer et al., 2012). Downloaded from cdna.eva.mpg.de/denisova/VCF/hg19_1000g/b) Neanderthal (Prüfer et al., 2014). Downloaded from cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/Positions with missing or low-quality calls (marked as ‘LowQual’ in the original VCF files) in one of the archaic samples were excluded during the merging procedure. In addition, heterozygous archaic SNPs were also removed to improve phasing quality (but see later analyses). Both types of SNPs were masked for both archaic and modern individuals. These additional filters resulted in a dataset comprising 35,395,615 SNPs. A second merged dataset was produced by excluding missing and low-quality archaic calls but retaining SNPs that were heterozygous in archaic individuals. This dataset was used to cross validate our results and assess potential bias (if any) introduced by trimming archaic heterozygous positions. A third dataset was produced by merging the primary dataset containing 471 modern and 2 archaic genomes without heterozygous archaic SNPs with the Vindija Neanderthal genome data (Vindija33.19, downloaded from cdna.eva.mpg.de/neandertal/Vindija/VCF/Vindija33.19/). As with the main dataset, positions that were missing, low-quality or heterozygous in the Vindija individual were masked for all archaic and modern individuals. We added ancestral allele information to our dataset using the Ensembl Compara 71 database (ensembl_compara_71@ens-staging2:3306, MethodLinkSpeciesSet: 6 primates EPO (548)). S4 - Phasing and phasing assessment The combined dataset comprising both modern and archaic samples was phased using read aware phasing with SHAPEIT v. 2.r837 (Delaneau et al., 2013). Phase informative reads (PIRs) were extracted using software guidelines from both modern and archaic BAM files. The HapMap phase II b37 recombination map (Frazer et al., 2007) was used and the phasing was performed using the following arguments: –states 400 –window 0.5 –states-random 200. We assessed the performance of our phasing approach, which incorporated phase informative reads, by comparing our phased haplotypes to two experimentally phased individuals (Prüfer et al., 2014) that overlap with the Simons Genome Diversity Project genomes included in our dataset – WON,M (corresponding to sample B_Australian-4 in the Simons dataset, Illumina ID SS6004477) and BUR,E (corresponding to sample B_Australian-3 in the Simons dataset, Illumina ID SS6004478). These two samples come from Aboriginal Australians with unknown geographic origin, but containing no genetic signals of recent admixture with Papuan or European populations. For each of the two individuals, we applied the following procedure: • Divide the genome into 8 Kb windows that are non-overlapping and at least 50 bp from any chunk masked by the alignability mask • Retrieve the corresponding SNPs from the experimentally phased FASTA files and our computationally phased VCFs. The BWA algorithm (v0.7.16a) (Li, 2013) was used to confirm that the windows aligned correctly. • Mask sites that disagree or are missing in the 8 Kb chunks (due to different SNP calling procedures) • Discard chunks that contain ≤ 1 heterozygote site. • Count the number of heterozygous sites and the number of switch errors in the remaining chunks, and divide the totals to obtain a switch error rate. This approach accounts for the impact of solitary heterozygote sites and the alignability mask, which can hide switch errors (if an even number of switch errors occur within a masked region). Based on 11164 8 Kb windows for WON,M and 11245 windows for BUR,E, we calculated switch error rates of 3.7% and 4% respectively. These are expected rates (see supplementary text S9 in the Mallick et al., 2016 study). Note that singletons are randomly placed in our phasing procedure, unless they are part of a phase informative read, and will also act to disrupt haplotypes. Quantification and Statistical Analysis S5 - Dataset genetic diversity and SNP novelty As an exploratory analysis of our phased dataset, we retrieved lists of SNPs in the 1000 Genomes Project (Phase 3) (1000 Genomes Project Consortium et al., 2015), dbSNP (Build 150), ESP (ESP6500SI-V2-SSA137) (Tennessen et al., 2012) and ExAc (ExAcRelease 1.0) (Lek et al., 2016). We counted the number of SNPs defined when considering each population group separately, and determined how many were observed in one or more of these datasets, and how many were not observed. Across all of our newly reported samples, we observed 11,859,578 SNPs, of which 21% (2,525,213) were novel. Existing datasets are better able to capture variation in West ISEA (9.3% SNPs novel) than East ISEA (14.3%) or Papua (21.1%), likely because previously published datasets incorporate a large number of mainland Asian samples. S6 - Estimating Asian-Papuan admixture proportions To estimate Asian-Papuan admixture proportions in ISEA and Papua, we used local ancestry inference implemented in LOTER (Dias-Alves et al., 2018). LOTER has been shown to outperform similar methods, such as HAPMIX (Price et al., 2009) and RFMix (Maples et al., 2013), and its accuracy is greatest when admixture is more ancient than 150 generations. This time frame (ca 4500 ya) is directly relevant for Indonesian prehistory – linguistic, archaeological and genetic evidence all point to the spread of Austronesian speakers beginning 4000–4500 ya from Taiwan, reaching eastern Indonesia 3500–3000 ya (Hudjashov et al., 2017). We specified two reference datasets: Papuans from the current study (N = 72; see Table S1) and East Asians (N = 293). The East Asian reference dataset included 293 geographically diverse samples, specifically 43 samples from the Simons Genome Diversity Project, and 50 random samples from each of the five East Asian populations in the 1000 Genomes Project (CDX – Chinese Dai, CHS – Southern Chinese Han, CHB – Northern Chinese Han, KHV – Vietnamese Kihn and JPT – Japan). Samples from ISEA were analyzed separately using the Papuan and East Asian reference datasets. To infer local ancestry tracts in Papuan samples, we created an individual Papuan reference dataset for every single target genome by excluding the individual sample from the full Papuan reference set to avoid self-copying. Results are reported in Main Text Figure 1A and Table S2.
|
|
|
Post by Admin on Jan 5, 2022 22:30:58 GMT
S7 - Archaic block identification a. ChromoPainter We used ChromoPainter v.2 (CP) (Lawson et al., 2012) to detect archaic ancestry in the genomes of our present-day individuals. This method relies on phased haplotype data and describes each individual recipient chromosome as a mixture of genetic blocks from the set of predefined donor individuals. This process, known as chromosome ‘painting’, generates a matrix of copying vectors, which can be analyzed further. For each recipient haplotype, CP also outputs the expected probability of copying from each donor population at each SNP. CP uses an Expectation-Maximization (EM) algorithm to re-estimate the proportion of genetic material copied from each donor by using the previous estimates as a new prior under the model, and then iterating. We painted each of our modern non-African human genomes individually using a set of 35 sub-Saharan African genomes from SGDP, which represents modern non-Eurasian human ancestry, and each of the archaic samples separately. We used a two-step approach: 1. The initial run was performed with 10 EM steps to estimate prior copying probabilities for each individual and chromosome separately using the following command line arguments: -i 10 -in -iM. 2. Next, estimated prior copying probabilities were averaged across the genome for each individual. The main run was performed with a recombination scaling constant, global mutation (emission) probability and genome-wide average prior copying probability from the first step using the following command line arguments: -n Ne -M mu -p prior_copying_probability. Either archaic or modern ancestry was then assigned to individual SNPs using a probability threshold of 0.85. Unknown ancestry was assigned to SNPs with intermediate copying probability. b. Hidden Markov Model To explore the behavior of the CP approach, we additionally implemented a hidden Markov model (HMM, github.com/guysjacobs/archHMM). Our approach is inspired by that of Racimo and colleagues (Racimo et al., 2017, Seguin-Orlando et al., 2014), and we follow their notation where possible. Our approach differs from the Racimo method in its parameter estimation procedure and the transition probabilities. It also purposely ignores African-specific diversity. However, the principle of applying an HMM to detect tracts enriched for non-African derived alleles shared with an archaic hominin has been widely applied in several slight variations (e.g., Lu et al., 2016, Prüfer et al., 2014, Racimo et al., 2017, Seguin-Orlando et al., 2014 and others) and as such is a standard approach in archaic introgression block estimation. c. S∗ We used the S∗ method (Plagnol and Wall, 2006, Vernot et al., 2016, Wall et al., 2009), which seeks to detect introgressed haplotypes from archaic hominins without using an archaic reference genome, following the implementation of Vernot et al., 2016. S∗ is sensitive toward highly diverged sequences with high LD, and thus is adequate to detect hominin introgressed haplotypes without having a reference sequence of the hominin in question. To calculate empirical S∗ in our non-African samples, we used 35 sub-Saharan Africans as a reference population and analyzed one non-African sample at a time. We removed any non-segregating sites from our data. We used the HapMap phase II b37 recombination map (Frazer et al., 2007), and ancestral information from the Ensembl Compara 6 primates EPO (548) database. Empirical S∗ values were estimated in 50 Kb genomic windows. After calculating the S∗ value from the real dataset, we used simulations to assign statistical significance to our empirical estimates. We performed simulations using different combinations of the number of segregating sites and recombination rates from the previous step. We simulated a previously described demographic model (Henn et al., 2015) with the simulator ms (Hudson, 2002). For non-West Eurasians samples, we performed simulations with the East Asian demographic model as presented in the Vernot et al., 2016 study. We obtained a neutral distribution of S∗ without hominin introgression for 50 Kb regions using 60000 simulations for every combination of the number of segregating sites and recombination rate. The following ms command line was used: ms <Total_N> 60000 -I 3 <Africa_N> <WEurasia_N> <EAsia_N> -s <SegSites> -r 38.7∗ <recombination_rate> 50001 -n 1 2.20 -n 2 4.47 -n 3 6.53 -g 2 101.69 -g 3 146.31 -m 1 2 1.49 -m 2 1 1.49 -m 1 3 0.46 -m 3 1 0.46 -m 2 3 1.85 -m 2 3 1.85 -ej 0.029 3 2 -en 0.029 2 0.29 -em 0.029 1 2 8.93 -em 0.029 2 1 8.93 -ej 0.087 2 1 -en 0.23 1 1 -p 5 We then used the computationally efficient strategy employed to estimate the S∗ value for every region as in Vernot et al., 2016. Finally, we assigned a p value for each 50 Kb genomic window in every individual sample in our non-African dataset using the neutral distribution of S∗ from the simulations without archaic introgression. Genomic windows with a recombination rate less than 0.005 or more than 20.01 and/or a number of segregating sites less than 20 or more than 511 were excluded. To assess the overall fit of the neutral demographic simulations to the data, we built General Additive Models (GAM) of the 95th and 99th percentiles of simulated neutral S∗ as a function of the number of segregating sites and recombination rate. We established two thresholds of significance: 1. To estimate the overlap between CP, HMM and S∗, and define high confidence Denisovan blocks, we identified regions as introgressed if the S∗ in the real data is more than the 95th percentile of the simulated data distribution. 2. In addition, for the residual S∗ analysis, we identified a region as introgressed if the S∗ in the real data is more than the 99th percentile of the simulated data distribution. This yields a more conservative, high confidence S∗ set, which allows for more robust inferences of residual S∗ signal (see below).
|
|
|
Post by Admin on Jan 6, 2022 1:12:16 GMT
S8 - Introgression results from the three methods Details of the amount of introgression based on the different methods, and profiling of the methods, are shown in Table S3. The total amount of introgression detected by the three methods, per phased haploid genome, is shown in Table S3A. There is a clear correlation between the methods, with most archaic introgression detected by all three methods in Papua and East ISEA, and with least in West Eurasia. The total amount of archaic introgression detected by CP was slightly less than that detected by the HMM, which was in turn considerably less than that detected by S∗ with 95% confidence. We note that different methods are expected to extract different amounts of introgression signal (see e.g., Skov et al., 2018 Table 1 for a recent comparison). The relative enrichment of Denisovan signal in Papua over West Eurasia was greater for CP (6.4-fold) than the HMM (5.0-fold), based on all pairwise Papuan/West Eurasian comparisons (paired t test, T = 280.5, p ≈ 0.0; Cohen’s D = 1.49). As Denisovan introgression is not expected in West Eurasia, the clear excess in enrichment seen in CP over the widely applied HMM approach strongly supports CP – when used in the manner described above – as an effective method for detecting archaic hominin introgression. Nevertheless, as Table S3A shows, a substantial amount of Denisovan introgression was detected by both methods in West Eurasia, suggesting that the three methods are individually detecting a number of false positive introgressed blocks. We hypothesize that this phenomenon is driven by ‘spillover’ signal from Neanderthal introgression. Indeed, incomplete lineage sorting could lead Neanderthal-introgressed blocks to have greater genetic similarity to the Altai Denisovan than Altai Neanderthal; and even blocks that coalesce with the Altai Neanderthal before the Denisovan/Neanderthal common ancestor will often show greater similarity to the Altai Denisovan than humans due to Denisovan/Neanderthal common ancestry. As an initial check on this hypothesis, we categorized the genome into regions that only had evidence for Denisovan or Neanderthal introgression (from one or both of the HMM and CP methods); regions that had conflicting evidence for both Denisovan and Neanderthal introgression from the two methods; and regions with an unknown signal arising from S∗ (> 95% confidence), thus identifying a region as introgressed but with no support from the HMM and CP methods (Table S3B). As predicted, discounting ambiguous signal, there is now 12.5 times as much Denisovan introgression in Papua compared to West Eurasia. We used these observations – that higher confidence Denisovan blocks are supported by multiple methods, and that unexpected signal can be driven by spillover from Neanderthal introgressed blocks – to refine our Denisovan block set (STAR Methods S9 below). S9 - Refining archaic block sets a. Iterated filtering improves specificity Both CP and the HMM were run using the Denisovan and Neanderthal genomes independently. However, Neanderthals and Denisovans are believed to share common ancestry more recently than the human/Neanderthal/Denisovan common ancestor (e.g., 495 versus 657 kya in Malaspinas et al., 2016), increasing the probability of introgressing archaic blocks showing greater similarity to both Neanderthals and Denisovans than modern humans. Table S3B profiles the degree to which the archaic portion of genomes overlaps in different continental groups. Importantly, we note that while only very little of the West Eurasian genome is assessed as unambiguously Denisovan (that is, identified by at least one of the CP or HMM methods as Denisovan, but as Neanderthal by neither), a considerable amount is ambiguous – that is, identified by different methods as both Neanderthal and Denisovan. This suggests that looking at the overlap of methods, and actively removing ambiguous blocks, may be a promising way to obtain a high confidence set of Denisovan blocks. A logical approach to this is to iteratively discard Denisovan CP blocks that are either i) not supported by the other methods (Denisovan HMM and S∗) or ii) also identified as Neanderthal by CP. We can do this by requiring more than a minimum overlap between each Denisovan CP block and the supporting methods, or less than a maximum overlap with a Neanderthal CP block. However, it is not clear how much overlap is appropriate to ensure that ambiguous or weakly supported Denisovan introgression blocks are removed, but that sufficient signal remains for further analysis. We therefore sought parameters for an incremental filtering procedure that maximizes the excess Denisovan signal in Papuans against West Eurasians, the two samples expected to have highest and lowest Denisovan introgression, respectively (Main Text Figures 2A–2C). For each Denisovan CP block on a chromosome copy of an individual, the procedure progresses in three steps: 1. Discard the block if the proportion of its length that is overlapped by Denisovan HMM blocks on that chromosome copy of the individual is under IDeniHMM∈[1.0,0.75,0.5,0.25,0.1,0.05,0.001]. 2. If the block survived (1), discard it if the proportion of its length overlapped by S∗ windows for the individual is under IS∗∈[0.999,0.75,0.5,0.25,0.1,0.05,0.001,0.0]. 3. If the block survived (1) and (2), discard it if the proportion of its length overlapped by Neanderthal CP blocks on the chromosome copy of the individual is over INeanCP∈[0.999,0.95,0.75,0.5,0.25,0.1]. Here, IDeniHMM=1.0 indicates a requirement that the entire CP Denisovan block is entirely covered by a single HMM Denisovan block, or by the union of multiple HMM Denisovan blocks, while IDeniHMM=0.001 indicates that only 0.1% of the block needs to be covered. We specifically chose to use CP Neanderthal blocks to exclude ambiguous signal, rather than HMM Neanderthal blocks, in order to additionally exclude any regions that might be readily, and perhaps falsely, identified as non-human specifically by the CP method; for instance, due to local patterns of African variation. We used the bedtools suite v.2.27.0 (Quinlan and Hall, 2010) with the subtract command and -N flag and -f flags to perform this procedure. We sought to retrieve high confidence Denisovan blocks while still retaining enough Denisovan signal for further analysis. Noting that the total genome inferred as Denisovan in Papua relative to West Eurasia (Main Text Figure 2C) increases greatly even with minimal overlap requirements (IDeniHMM=0.001, IS∗=0.001 and INeanCP=0.999), we chose these to define our high confidence Denisovan block set. Thus, the refinement procedure is: 1. Starting with the Denisovan CP output, remove any block that is less than 0.1% overlapped by Denisovan HMM output. For example, for a 100 Kb Denisovan block identified by CP to be kept, we require at least 100 bp to be covered by Denisovan HMM output in the same chromosome copy of the individual as well. 2. Of the remaining blocks, remove any block that is less than 0.1% overlapped by S∗ output. 3. Of the remaining blocks, remove any block that is over 99.9% overlapped by Neanderthal CP output. After applying these filters, we are left with approximately 32.3 Mb of high-confidence Denisovan introgressed blocks per genome copy for a Papuan individual, from our original set of 59.2 Mb. For a West Eurasian, we are left with just 688 Kb, down from 9.5 Mb. There is now nearly 50 times as much Denisovan signal in Papuans, a profound enrichment from the 6 times in the original method output. Some genuinely Denisovan-like blocks may be found in West Eurasians due to a) introgressing blocks from a Neanderthal population that randomly coalesce with the sampled Denisovan before the sampled Neanderthal, b) incomplete lineage sorting dating to the common ancestor of modern humans and Denisovans, and c) limited migration between populations with known Denisovan ancestry (e.g., East and South Asians) and West Eurasians since introgression occurred. b. Filtering supported by SGDP introgression values The substantial increase in the enrichment of Denisovan signal in Papuans over West Eurasians following our iterated filtering approach is a strong indication that we are successful in removing spillover signal from Neanderthal introgression. This increased accuracy comes at the cost of decreased power to detect true Denisovan introgression. While the filtering method reduces the Denisovan signal in West Eurasia from 9.5 to 0.7 Mb (93%), the signal halves in Papua from 59.2 to 32.3 Mb, which is a greater signal depletion in absolute terms despite broadly similar proposed levels of Neanderthal introgression in both regions (Mallick et al., 2016). To profile this behavior in real data, we turn to genome-wide estimates of Denisovan and Neanderthal introgression reported for the SGDP samples in our dataset, calculated using counting statistics (Mallick et al., 2016). Genome-wide estimates of introgression have a benefit over haplotype inference in that they measure the average signal of introgression over the entire genome rather than pulling out small chunks; as a result, they are better able to estimate the overall level of introgression, and necessarily yield genome-wide proportions that are greater than the sum of excavated archaic sequence haplotypes. The SGDP samples represent global populations, including groups thought to harbor just Neanderthal introgression (West Eurasian), but also Asian and Oceanian groups with both Neanderthal and Denisovan ancestry, and so are ideally suited for a comparison of methods. We performed multiple linear regression using the Python statsmodels v.0.8.0 package (Seabold and Perktold, 2010) on the SGDP genome-wide estimates of Denisovan (SGDPDeni) and Neanderthal (SGDPNean) introgression (Table S1 in Mallick et al., 2016) as independent variables, and either the total Denisovan sequence identified for each sample by CP alone or our high confidence Denisovan introgressed sequence as the dependent variable. In this analysis, our introgression estimates are expressed as a proportion of introgressed sequence per individual genome, such that the units are the same in the two datasets. Given the high correlation coefficients, the factors in Table S3C provide an indication of the amount of spillover signal (significant correlations of our Denisovan introgression estimates with SGDPNean, or of our Neanderthal estimates with SGDPDeni) and the relative power of our methods to detect overall levels of introgression. We assume that the published introgression estimates from the SGDP data are accurate, which is supported by the strong consistency of their results with other published findings (e.g., the Denisovan introgression peak in Papua; more Neanderthal introgression in East Asians than West Eurasians as in Wall et al., 2013). We find evidence of considerable ‘spillover’ signal in the raw CP results. On a worldwide scale, SGDPNean significantly predicts CP Denisovan signal (factor 0.15), while SGDPDeni predicts CP Neanderthal signal (factor 0.19). In West Eurasia, where SGDPDeni is minimal (highest 0.2%, in populations with some Asian ancestry like the Saami), levels of SGDPNean alone are sufficient to predict our CP Denisovan signal with high accuracy (factor 0.21, r2 = 0.99); the spillover effect accounts for the entire signal. Studying the Denisovan CP signal in Papuans only – a population with high levels of both Denisovan and Neanderthal ancestry – is especially informative about the amount of spillover signal, with a SGDPNean factor of 0.28 only marginally smaller than the SGDPDeni factor of 0.36. Note, however, that the substantially greater amount of Denisovan than Neanderthal introgression in Papuans (4%–6% compared to ∼2%) means that the CP Denisovan signal is still largely composed of Denisovan introgression. The high spillover rate in the CP Denisovan signal and generally lower power to detect Denisovan than Neanderthal introgression reflect the more distant relationship between the Alai Denisovan and the introgressing Denisovan population, compared to the Altai Neanderthal and introgressing Neanderthal population. Spillover largely disappears when using the high confidence blocks. Worldwide, while SGDPNean and SGDPDeni remain significant predictors of our high confidence Denisovan and Neanderthal introgression signals, their factors drop from 0.15 to −0.01 and 0.19 to 0.04, respectively. There is minimal spillover when estimating our high confidence Denisovan signal in West Eurasia (SGDPNean factor falls from 0.21 to 0.03), and both SGDPNean and SGDPDeni are now only significant predictors of the high confidence signal for their corresponding archaic species. Our high confidence intersection method thus substantially reduces the false positive rate as reflected in the spillover signal when estimating genome-wide levels of introgression. There is also a substantial decrease in relative power – at a worldwide scale, from about 41% to 28% when searching for Denisovan introgression, and 56% to 39% when searching for Neanderthal introgression. The low overall power of methods that detect introgression blocks compared to genome-wide statistics is expected, as is the higher power to detect Neanderthal introgression due to the more recent ancestry between the Altai and introgressing Neanderthal than the Altai and introgressing Denisovan. The reduction in power when using the more conservative high-confidence signal is also expected – it reflects the cost of higher specificity. We note that in cases where Denisovan introgression is essentially absent (West Eurasia), a reduction in relative power is still observed for our high confidence Neanderthal signal due to spillover of Neanderthal introgression signal into the CP Denisovan block set. Thus, when there is strong evidence of only one introgression source, it may be better to avoid stringent block filtering. c. Distribution of high confidence introgression Table S3D shows the amount of remaining Denisovan signal after removing Neanderthal spillover per haploid phased genome. The results are particularly interesting because our trimming method leads to substantial depletion of the signal in Asian populations, as well as in West Eurasia. Previous work has suggested approximately 0.5% Denisovan introgression into South and East Asians, as compared to a 4%–6% contribution to Papuans (Browning et al., 2018, Mallick et al., 2016, Reich et al., 2010). The relatively small amount of Denisovan signal in non-Papuans strongly implies either that previous work has overestimated the Denisovan contribution outside of Papua, likely due to ambiguous Neanderthal spillover, or that the amount of Denisovan introgression into Papua is higher than previously thought. If the former is the case, a rough calculation based on a linear additional contribution from the introgression percentage, and assuming West Eurasia has 0% introgression and Papua 4%–6%, implies continent average Denisovan introgression levels of 0.17%–0.33% in mainland Asia. Performing a similar trimming procedure using Neanderthal, rather than Denisovan, blocks similarly yields estimates of Neanderthal introgression levels (Table S3E). We are also able to clearly detect the known excess Neanderthal signal in East Asians over West Eurasians, with East Asians, Siberians, Americans and Southeast Asians having about 45% more Neanderthal introgressed sequence than West Eurasians. This is highly consistent with previous estimates of a 40% increase (Wall et al., 2013), as is the placement of South Asians as intermediate between the groups. The standard deviation of the amount of archaic introgression within continental groups reflects variation in introgression levels both within and between continental subpopulations, which in turn reflect patterns of introgression into ancestral populations and more recent demography. We note that the variation in Neanderthal ancestry is greatest, and that the coefficient of variation is greatest, among West Eurasians compared to other groups. There is a higher coefficient of variation for Denisovan ancestry among East Asians and Siberians compared to Papuans. The highest coefficient of variation is in West ISEA, where it is driven by the inclusion of the Sulawesi population, which has relatively higher Papuan ancestry (13.4%, Table S2) and greater high-confidence Denisovan introgression (∼4.2 Mb) than more westerly groups.
|
|
|
Post by Admin on Jan 6, 2022 2:10:30 GMT
d. Denisovan signal follows Papuan ancestry We assessed the correlation between Papuan ancestry and Denisovan introgression in southeast Asia using the LOTER output and various measures of archaic introgression. Models were fit using Ordinary Least-squares and the Python package statsmodels v.0.8.0. We fitted the gradient of a linear model with point [1,1] fixed to ensure that a 100% Papuan individual has 100% the Denisovan introgression observed in Papua. We fitted both the raw CP Denisovan output (Main Text Figure 2D, left pane) and the high-confidence Denisovan blocks (Main Text Figure 2D, right pane). In each case, the correlation was strong (r2 = 0.98) and the linear fitting highly significant (p < 1 × 10−30). The linear fitting of the raw CP Denisovan blocks against Papuan ancestry had a gradient of 0.8 [0.79-0.81] implying an intercept of 0.20 [0.19-0.21]. A gradient close to 1, 1.01 [1.0-1.03], and a correspondingly low intercept of −0.01 [0.03-0.0] is observed for the high-confidence Denisovan blocks, consistent with the signal of Denisovan introgression being primarily related to Papuan ancestry in ISEA. Note that LOTER tends to predict a small proportion (< 5%) of Papuan ancestry in West ISEA, which matches the limited Denisovan introgression in the region (Table S2). This may imply that a considerable portion of Denisovan signal in East Asia more broadly is due to very low levels of Papuan ancestry, despite some additional introgression known to be specific to the region (see Browning et al., 2018 and our Main Text Figure 3). Comparing the linear regressions in Main Text Figure 2D, the predicted amount of Denisovan ancestry for a fully Asian population falls when using the high-confidence blocks in which the Neanderthal spillover signal has been removed. This emphasizes that Neanderthal introgression needs to be carefully controlled for when asking questions about Denisovan-specific ancestry. The village of Rampasasa, near the Liang Bua cave site at which Homo floresiensis (Brown et al., 2004) remains were found, and the island of Flores on which the site is located, do not emerge as regional anomalies; the Denisovan signal observed is closely predicted by their level of Papuan ancestry. This suggests that any introgression specific to this part of East ISEA would not be contributed by an archaic hominin on the Denisovan clade; we study the possibility of other local introgression signals from alternative hominin species below (e.g., S∗, STAR Methods S12 and S13, and Main Text Figure 7). S10 - Archaic mismatch analysis a. Mismatch against the Altai Denisovan genome An informative way of assessing the relationship of our high confidence Denisovan blocks to the Altai Denisovan genome, which was used to extract them, is by mismatch analysis. As longer blocks are better able to resolve a mismatch distribution (explored in Figure S2 and below), we extracted the 2000 longest blocks from each continental population. For each block with > 50 Kb of total unmasked sequence data (counts in Table S3F), we calculated the number of differences compared to the Denisovan reference (with Denisovan and Neanderthal heterozygous positions masked; see STAR Methods S3). We then calculated the effective block length by subtracting the portion of each block covered by the alignability mask from the total block length, and converted these values into a mismatch (difference/bp). As the number of differences per bp will be impacted by our masking of low quality and heterozygous archaic sites, quality control decisions (e.g., call rate > 99% filter) and the SNP calling protocol, it is not possible to directly translate mismatch distance into times based on a standard human genome mutation rate. We therefore chose to scale the observed mismatches by dividing the mismatches of each block by the average genome-wide mismatch rate between the Altai Denisovan and West Eurasian samples, mD. We chose West Eurasians as our baseline mismatch rate because that population has the lowest amount of Denisovan introgression. Note that the average genome-wide mismatch rate between Altai Denisovans and West Eurasians, assuming 0% Denisovan ancestry in West Eurasians, reflects both the divergence time of humans and the Neanderthal/Denisovan clade, and the ancestral population size of the common ancestor of humans, Neanderthals and Denisovans. We calculated the scale factor by summing the total number of mismatches between each of the 75 samples × 2 = 150 West Eurasian haploid genomes and the Altai Denisovan, and dividing this by 150 times the total length of unmasked sequence in the dataset. Plotting these mismatches by continent yields the distribution pictured in Main Text Figure 3C. The variation in the number of blocks > 50 Kb between continental groups reflects varying sample size, as well as the total amount of Denisovan introgression and the time since Denisovan introgression in different groups. Two primary patterns emerge. First, we replicate the results of Browning et al., 2018 in identifying a mainland Asian-specific peak with relatively low divergence to the Altai Denisovan. This peak is not limited to East Asian populations (in whom it was originally detected), but also extends into Siberia and the Americas. Interestingly, our American populations show some reduction in the extent of this peak, potentially suggesting that introgression was ongoing more recently than the divergence of American and East Asian populations. Second, there is a clear dual mismatch peak in the Papuan population that is not apparent in other groups, which instead show a single divergent mismatch peak with some fluctuation in its exact placement. Given these fluctuations, we sought to confirm that the twin Papuan peaks were not a result of a very small number of common blocks being overrepresented due to high frequency in the sample, which could emphasize stochasticity in the coalescent and mutation processes. For example, in an extreme case, if a set of 2000 blocks consists of just 14 blocks near fixation then only a few independent coalescent histories might be represented in the mismatch distribution. Focusing on the full set of 2000 largest Papuan blocks, we instead observed 226 entirely disjoint regions using the bedtools ‘merge’ function. Of these, 151 were > 0.5 Mb from any other region. Although the top ten most introgressed regions contributed 515 blocks, this leaves 216 disjoint regions with an average frequency of 4.8% in Papuans. The two highest frequency blocks, at 94/144 and 57/144 in the 72 (diploid) Papuan samples, lie outside both of the Papuan-specific mismatch peaks, the former having an intermediate mismatch and the latter an unusually large mismatch versus the Altai Denisovan. To assess the impact of using a different minimum block length cut-off, we determined the mismatch using thresholds from 0 to 260 Kb (Main Text Figure 3A). To ensure mismatch accuracy is not being impacted by the alignability mask, we additionally required that blocks contained total called sequence over the minimum cut-off. We did not calculate the mismatch when less than 50 entirely non-overlapping blocks are involved in the analysis. The twin peaks obtain resolution at approximately 130 Kb (based on 3556 blocks). As expected for small block sizes, the overall resolution is poor below 50 Kb. We also explored the role of the minimum block length cut-off for our combined Siberia and East Asia sample (Main Text Figure 3B) and East ISEA continental groups (Figure S2). Again, the East Asian and Siberian specific peak only resolves when a minimum block length cut off ≥ 50 Kb is applied. The two peaks observed in Papua do not clearly resolve for East ISEA, due to the reduced sample size and small Papuan ancestry leading to fewer observed blocks in this region. We hypothesize that the challenge in resolving mismatch peaks using short blocks is related to two factors. First, mutations are discrete. In our analysis, it might be typical to observe 0 or 1 mismatching base pairs in a small 1 Kb block, corresponding, respectively, to extremely low and extremely high mismatch compared to the average mismatch of larger, ‘higher resolution’ blocks. Equivalently, it is not possible to observe a theoretically expected mismatch of, for example, 0.0005 in a 1 Kb block as 1/1000 = 0.001 and 0/1000 = 0.0; a better approximation of the expected value is possible with larger blocks. Second, the ratio of stochasticity in the mutation process along the branches of a coalescent tree versus the difference in coalescent times driving two mismatch peaks is also an important variable. To explore these phenomena, we performed a simple simulation of expected mismatch for two populations diverging from a source population at two fixed times, t1 and t2. The distribution of mismatches corresponds to a Poisson with mean 2μlc where μ = 1.45 × 10−8 and is the mutation rate per base pair per generation, l is the block length and c is a coalescent time. The values of c are repeatedly drawn from the distribution of coalescence times – either an exponential distribution with location parameter t1 and mean t1 + 2Ne, or with location parameter t2 and mean t2 + 2Ne. In this way, we are able to incorporate random coalescence and mutation. Using the values of t1 = 9750, t2 = 12500 and Ne = 100 from our simulation fitting (see below) for the two Denisovan-like populations contributing to Papua, and t1 = 5000, t2 = 12500 and Ne = 100 for the two Denisovan-like populations contributing to East Asia, we confirm that longer blocks are required to detect two signals of divergence (Figure S2C). This is particularly true when the divergence times are less widely separated, as is the case for Papua. The value used here for the divergence time of t1 = 5000 generations for the Asian-specific Denisovan introgression signal is an approximation based on the location of that mismatch peak and not based on explicit modeling. This phenomenon also explains why we are able to detect the multiple peaks in East Asia and Siberia, despite having considerably shorter blocks in these populations. For example, the 2000 longest blocks in these populations have an average length of just 43 Kb and 35 Kb respectively, yet we are still able to detect two Denisovan mismatch peaks due to the probable difference in population divergence driving these of 7500 generations (5000 versus 12500). In contrast, the longest 2000 blocks have an average length of 263 Kb in Papuans, and such long blocks appear necessary to capture two populations diverging from the Altai Denisovan in a narrower time frame, approximately 2750 generations apart (9750 versus 12500). Among East ISEA, the longest 2000 blocks have an average length of 152 Kb; Figure S2C implies that this lack of resolution may be hiding the two peaks that we detect in Papua. We further assessed the hypothesis that the lack of a clearly bimodal East ISEA mismatch distribution could be related to our power to resolve the true mismatch distribution. To address this, we re-sampled Papuan blocks to mimic the distribution of block lengths in East ISEA based on rejection sampling (with replacement). We sampled 1000 sets of 2000 blocks, and generated mismatches against the Altai Denisovan. Plotting these mismatch distributions against the observed Papuan and East ISEA mismatch distributions (Figure S2B) demonstrates that the signal of a dual mismatch peak would be expected to be weak in our East ISEA sample based on the smaller number of long blocks available. b. Confirming the dual Denisovan mismatch signal Before proceeding to analyze possible causes of the dual Denisovan mismatch signal in Papuans, we sought to confirm that it is observed in block sets retrieved using a variety of methods. We first confirmed that the signal is apparent both in the raw CP Denisovan and HMM Denisovan output (Figures S3A and S3B) to verify that our iterative trimming approach is not causing the signal. The HMM tends to detect longer archaic blocks than CP, which incorporates sequence with greater divergence from the Altai Denisovan, consistent with its lower specificity (higher West Eurasian Denisovan signal in Table S3A). The signal remains (Figure S3B) when performing a similar trimming procedure to that described above on our HMM block set (see STAR Methods S9a), this time removing blocks that were i) less than 50% overlapped by CP Denisovan blocks, ii) less than 0.1% overlapped by S∗ windows, or iii) over 99.9% covered by Neanderthal HMM output. These criteria were chosen to obtain a high level of enrichment of Papuan Denisovan signal over West Eurasian Denisovan signal. We next considered the possibility that our masking of heterozygous sites in the archaic genomes to simplify phasing was causing the double peak. Briefly, in a tree of a single introgressing haplotype X and two haplotypes of the Altai Denisovan, Da and Db, there are two possible coalescent topologies – either the first coalescent event is between two Altai Denisovan haplotypes, (X,(Da,Db)), or between an Altai Denisovan haplotype and the introgressing haplotype, (Da,(Db,X)) or (Db,(Da,X)). As more homozygote mismatches are expected in the former case, our masking of sites that are heterozygous in one of the archaic genomes could generate a complex mismatch signal. To confirm that the masking of heterozygotes was not causing the multiple peaks, we re-phased the data retaining loci with archaic heterozygotes, and performed the CP and mismatch analysis on this data. This time, we trimmed the CP Denisovan set by simply removing blocks that were more than 99.9% overlapped by CP Neanderthal blocks. As before, the twin peaks are clearly visible (Figure S3A). c. No dual mismatch signal in Neanderthal blocks To better understand the potential demographic implications of the dual mismatch peaks observed in introgressed Denisovan blocks among Papuans, we generated similar mismatch distributions based on our 2000 longest high-confidence Neanderthal-specific introgressed blocks for each continental group. These were generated using the same trimming protocol described above (STAR Methods S9a), but starting with CP Neanderthal blocks and requiring overlap with both the Neanderthal HMM output and S∗, and only allowing limited overlap with CP Denisovan blocks (see Table S3E for continental distributions of these blocks). As before, we only used blocks with > 50 Kb of unmasked sequence data. A large number of blocks remained for all continental groups (1978–1999 blocks), and the average block length ranged from 168 Kb (America, with 27 samples) to 287 Kb (Papua, with 72 samples). As with the Denisovan introgressed chunks, the average length of the 2000 longest blocks reflects both sample size and introgression history; the higher levels of Neanderthal introgression in the continental groups translates to longer chunks being successfully extracted. The mismatch for each continental group is shown in Figure S3D. For ease of comparison, the curves are again scaled to the genome-wide average mismatch between West Eurasians and the Altai Denisovan. Interestingly, despite slight fluctuations, a single dominant Neanderthal peak is observed for all populations. This strongly suggests a) that any demographic cause of the dual mismatch peak relates to events occurring in the Denisovan population, rather than among Neanderthals or the Denisovan/Neanderthal common ancestor, and b) that the dual peak is not caused by a bioinformatic error or property of our methods (see further discussion below) that might give rise to a bimodal mismatch distribution against any archaic hominin. We confirmed the single mismatch peak using the HMM Neanderthal block set, removing blocks that were i) less than 50% overlapped by CP Neanderthal blocks, ii) less than 0.1% overlapped by S∗ windows, or iii) over 99.9% covered by Denisovan HMM output. This confirmed the unimodal mismatch distribution of Neanderthal introgressing blocks against the Altai Neanderthal (Figure S3E). As a final consistency check, we sought to make use of a second ancient Neanderthal genome, the Vindija Neanderthal (Prüfer et al., 2017). This sequence is known to be more closely related to the Neanderthal that introgressed into modern humans than the Altai Neanderthal, and thus may be better suited for extracting introgressed Neanderthal blocks (a 10%–20% increase is reported by Prüfer et al., 2017). We used CP to extract introgressing blocks from Papuan and East ISEA individuals, this time with either the Vindija Neanderthal genome or both Neanderthal genomes as our Neanderthal group. The mismatch of these blocks against the Altai Neanderthal again shows a unimodal distribution (Figure S3F). We additionally repeated our trimming procedure of the CP Denisovan blocks to create high-confidence block sets, now with CP Vindija blocks or blocks identified by CP as affiliated with either Vindija or the Altai Neanderthal removed. Again, the bimodal mismatch distribution is observed (Figure S3C). If the more divergent peak were Neanderthal spillover signal, then we would expect it to be reduced by removing more Neanderthal introgressed sequence; such behavior is not apparent.
|
|
|
Post by Admin on Jan 6, 2022 3:31:27 GMT
d. Assigning blocks to Denisovan ancestries To statistically confirm the bimodal distribution, we fitted a Gaussian and Gaussian mixture to the mismatch distribution of long (> 180 Kb; Main Text Figure 3A) Denisovan introgressed blocks identified in Papuans for 0.1 < MD < 0.23 using the Python package sklearn v. 0.19.1 (function sklearn.mixture.GaussianMixture; Pedregosa et al., 2011). We used blocks > 180 Kb because i) large blocks are needed to resolve a complex mismatch distribution (Figure S2C); ii) the mismatch distribution is relatively stable with a minimum block length of 180 Kb or more (Figure 3A); and iii) sufficient Papuan blocks (n = 1683) remain for downstream analysis using this minimum block length. The bimodal distribution was strongly supported (AIC = –5808.85, versus unimodal –5582.72). Note that the negative AIC values occur due to the high probability density in the range of mismatch values observed, with the probability density function of the Gaussian mixture model concentrated over a mismatch range less than 1; and that the difference between AIC scores rather than their values are relevant here. The Gaussians are distributed according to N(0.146, 0.018) and N(0.199, 0.015), and are weighted [58%, 42%] respectively, where N(μ,s) indicates a Normal distribution with mean μ and standard deviation s (Main Text Figure 4A). We additionally statistically confirmed that a bimodal mismatch distribution was supported for unique chunks > 180kb in Papuans. Here, we recorded the mismatch of sets of exactly overlapping chunks as their average mismatch, to limit the impact of high frequency chunks on the distribution. The bimodal distribution remains strongly supported (AIC = −2158.53, versus unimodal −2076.80). This confirms that the bimodal mismatch distribution is unlikely to be an artifact due to selection (i.e., a small number of introgressed regions with high frequency) or sampling effects. We used the Gaussian mixture model to assign blocks to either the less-divergent Denisovan component (D1) or the more divergent Denisovan component (D2), classifying as ambiguous any block with less than 80% support for one model over the other (i.e., 0.2 < p(D1) < 0.8). We were able to classify 1538 of 1683 blocks in this manner, 718 to D1 and 508 to D2; we did not classify blocks outside the 0.1 < mD < 0.23 bounds. These block sets were used in most subsequent analyses. When studying the demographic implications of the bimodal mismatch distribution, we studied either the full mismatch distribution (when retrieving split dates of Denisovan populations), or the entire set of high-confidence Denisovan blocks, classifying blocks with mD < 0.1 to D1 and blocks mD > 0.23 to D2. This allows us to maximize the information in those analyses. Spillover from D0 or Neanderthal – into D1 or D2 respectively – will be minimal as those introgression sources are extremely limited in our high confidence Denisovan block set. In total, only 6 blocks were assigned to both D1 and D2 in 6 different genomic regions. We suspected that these cases reflect occasional misclassification due to the stochasticity of the mutation process; most blocks in a region are consistent with the D1 or D2 classification, but some spill over to the other peak. We expect a negative correlation between block frequencies in this case. Calculating the frequency as in described in STAR Methods S12, we compared the correlation between D1 and D2 frequencies in our small sample of 6 overlapping blocks with 1000 randomly selected sets of 6 non-overlapping block pairs. We observed a weak negative correlation trend in the set of overlapping blocks (gradient = −1.15, lower than 92% of resampled sets), consistent with the probability of occasional block misclassification. A simple way to achieve a bimodality in a mismatch distribution is through a demographic model involving multiple sources of archaic introgression. Browning et al., 2018 recently inferred that two Denisovan-like ancestries introgressed into East Asians on the basis of a bimodal mismatch distribution. Before proceeding to analyze the mismatch distribution of Denisovan introgressed blocks in Papuans in this context, we sought to exclude other factors, including bioinformatic bias, and to confirm that the driving signal is consistent with introgression from two source populations on the Denisovan clade and not potential confounders. e. D1/D2 signal is not a bioinformatic bias The set of Papuan samples that we study come from three different sources – 30 newly generated sequences, 25 samples from Malaspinas et al., 2016 and 17 samples from Mallick et al., 2016. We used exactly the same pipeline for mapping and base calling on all three datasets, and while we did not observe any obvious differences between data from the three sources during our quality control steps, we thought it possible that the bimodal distribution could conceivably be caused by this batch effect. However, blocks assigned to D1 and D2 were common over all three sample sources, and the number of blocks assigned to D1 and D2 in sequences from the three sources showed no significant deviation from random expectations (χ2 = 0.0997; p = 0.95). While the high coverage of the Altai Denisovan genome argues for high confidence in SNP calls, a bimodal distribution of mismatch distances could be generated by certain quality biases. To rule this out, we performed two checks. First, a well-documented form of ancient DNA damage is cytosine deamination leading to C-to-T substitution (Hofreiter et al., 2001). If there were a strong bias in GC content in the genomic regions of the two block sets, with higher GC content in the D2 regions, then deamination of the ancient Altai Denisovan genome could increase the D2 mismatch between the modern (introgressing blocks) and ancient sequences. We therefore assessed GC content in the two Denisovan block sets using the UCSC Table Browser (GRCh37/hg19, Mapping and Sequencing, GC percent, gc5Base). The average GC content is very similar in both sets (39.75% for D1 and 38.75% for D2) and the distributions clearly overlap. Second, we confirmed that the number of Denisovan low-quality alleles does not differ greatly between the D1 and D2 blocks sets (D1: 0.122%, D2: 0.124%; of the 23.1 Mb and 19.2 Mb of D1 and D2 sequence identified, respectively). While low-quality SNP calls were masked in the dataset used to identify the two block sets (see STAR Methods S3), a strong bias in the amount of low-quality calls in the genomic regions covered by one block set could imply relevant biases in region quality. We did not observe this. These checks, along with the lack of a bimodal mismatch distribution when comparing introgressing blocks retrieved using the Altai Neanderthal genome against the Altai Neanderthal (Figures S3C–S3E) and the fact that our filtering process to retrieve high confidence Denisovan blocks requires that blocks overlap S∗ output (which does not use a reference archaic genome), suggest that genetic properties of the Altai Denisovan genome do not substantially contribute to the mismatch difference between D1 and D2 blocks. We then sought to determine whether the mismatch seen in the two block sets might reflect challenges related to alignability. The overall proportion of coverage by the alignability mask was extremely similar in the genomic regions covered by the two block sets (median: D1 = 6.55%, D2 = 6.52%), indicating that alignability challenges are unlikely to be leading to biases in sequence diversity. We then considered whether phasing errors might be higher in one block set than the other. For example, if a considerable amount of human sequence were erroneously incorporated into D2 blocks but not D1 blocks, this could drive up the D2 mismatch versus the Altai Denisovan. A simple calculation based on the Malaspinas et al., 2016 model of hominin evolution, with a split between a single introgressing Denisovan clade and the Altai Denisovan tD,DI = 11998 generations ago (ND = 13249) and the Denisovan/Human split tH,D = 20225 generations ago (NHA = 32671), suggests a considerable human input would be required for D2 blocks to be approximately 30% more divergent from the Altai Denisovan than D1 blocks. The approximate human component required is h where: (1−h)(2(tD,DI+2ND))+h(2(tH,D+2NHA))2(tD,DI+2ND)=1.3 such that h ≈0.25. This can be considered a conservative lower bound, in that many introgressed Denisovan haplotypes will survive into the larger Ne regime and hence have greater mismatch versus the Altai Denisovan. Such a pronounced phasing error is unexpected, and would be easily observed. As a check, we asked whether the average recombination rate differs between D1 and D2 genomic regions. A higher recombination rate in one set of blocks would be expected to lead to faster recombination between human and introgressing haplotypes and could complicate phasing. Additionally, recombination rate is expected to generally impact local genetic variation as it determines the linkage of neutral regions to any nearby selected variation. We compressed the sets of blocks in the two components to their respective overlapping subsets using the bedtools function ‘merge’, leaving 86 D1 regions and 68 D2 regions. For each region we calculated the average recombination rate using the same combined HapMap phase II b37 genetic map as used for phasing (Frazer et al., 2007); the distribution of recombination rate between the two chunk sets was not significantly different (standardized 2-sample Anderson-Darling test statistic –0.60, asymptotic p = 0.71). Ideally, we would directly profile switch point errors by comparing our Denisovan introgressed blocks to those retrieved using experimentally phased haplotypes. However, experimentally phased data are only available for two Australian samples in our dataset, and given the low Australian sample size, we only called D1 and D2 blocks in Papuans. We therefore searched for local variation in the mismatch within D1 and D2 blocks. For CP to call a block as Denisovan that has human sequence data incorporated into it, we expect that the human sequence data would need to be closely surrounded by real introgressing Denisovan sequence. If this were not the case, then CP is expected to simply bisect the Denisovan block. A Denisovan block containing human sequence is therefore expected to have a sharp increase in the mismatch against the Altai Denisovan and a sharp dip in the mismatch against a human genome toward the center of the block. We assessed this signal by dividing D1 and D2 blocks into thirds, and calculating the mismatch of each third against the Altai Denisovan and a West Eurasian sample. For this analysis, we used a minimally masked version of the dataset where the main phased data was combined with unphased data in which archaic heterozygous/low quality sites were not masked, and the call rate filter was not applied; heterozygote/homozygote mismatches in unphased regions were half-weighted to reflect the average 50% probability of applying to the block, and heterozygote-heterozygote mismatches were not counted. The West Eurasian was chosen to be sample LP6005441-DNA_G10, a Russian with approximately average high confidence Denisovan-specific introgression as compared to other West Eurasian samples. We second counted the number of blocks showing a pattern indicative of potential incorporation of human sequence – lower mismatch against the human sample and higher mismatch against the Altai Denisovan in the middle third. 3.2% of blocks were consistent with this pattern in D1, compared to 5.0% of blocks in D2 (non-significant Chi square statistic 1.87, p = 0.17). The patterns of mismatches over individual blocks suggest that incorporation of human sequence due to phasing errors is rare, and similarly rare in D1 and D2. We additionally confirmed that the tendency for Denisovan blocks to be called on both chromosome copies of an individual – the Denisovan haplotype homozygosity within that individual – was similar between D1 and D2 blocks (10.9% and 10.2%, corrected χ2 = 0.066, p = 0.80). Based on these consistency checks, we do not interpret the mismatch difference between D1 and D2 as being bioinformatic in origin.
|
|