|
Post by Admin on Apr 28, 2024 10:58:48 GMT
LoF variants and human knockouts JEWEL dataset allowed us to explore potentially clinically important protein-coding variants in Japan. In our analysis, we identified 18,481 LoF variants in 9045 genes, including 9780 LoF variants not registered in gnomAD or ToMMo (4.7K), with a substantial proportion of these being rare (Fig. 2A and table S11). These LoF variants are defined as variants that may cause premature stop codons (stop-gained), small-sized indels that shift the coding sequence (frameshift), or variants that change two immediately adjunct nucleotides to the splicing sites (splicing variants). Furthermore, we classified 177,112 synonymous variants and 306,923 missense variants, which affected 18,651 and 19,103 genes, respectively (Fig. 2B). Examination of LoF variants together with carriers’ UMAP values identified 32 and 37 LoF variants, whose frequencies were significantly associated with UMAP1 and UMAP2 (false discovery rate < 5%), respectively (see Materials and Methods and table S12). We noticed that individuals from Northeast had the lowest average number of singleton coding variants compared to those from other regions (table S13). Since the sample size of Northeast is smaller than that in other Hondo regions, we conducted a random resampling analysis and confirmed that this observation is likely not attributable to sample size (table S14). We speculate that other factors, such as demographic history, especially population expansion, may be influencing this observation. Despite regional differences, the ratio between singleton missense and singleton synonymous variants (dN/dS) across regions was consistently close to 2, which is an observed ratio of de novo missense and synonymous variants reported in an in vivo study (53). Furthermore, consistent with observations in another report, this ratio negatively correlates with the AF, suggesting that many rare missense variants might be deleterious but remain in the gene pool (54). To further test this idea, we calculated the missense risk score by integrating annotations from 30 different annotation tools (see Materials and Methods). We observed that the missense risk score increased as the AF decreased (P < 2.2 × 10−16, Pearson correlation test). On average, singletons exhibited the highest risk scores (table S15). On the basis of the data above, missense variants that are rare in the general population could be prioritized for disease association analysis. This approach to prioritization could narrow down potential candidates, thereby increasing the likelihood of identifying a meaningful clinical connection.
|
|
|
Post by Admin on Apr 29, 2024 22:15:39 GMT
Fig. 2. LoF variants and human knockout in the JEWEL dataset. (A) Number of known and unregistered LoF variants compared with gnomAD database (v2.1.1) and ToMMo (4.7K). Variants are categorized into four AF bins. Common: MAF > 1%; rare, MAF < 1% and MAF ≥ 0.01%; ultrarare: minor allele count > 1 and MAF < 0.01%; singleton. (B) Cumulative number of genes affected by LoF, missense, and synonymous variants. (C) The average percentage of transcripts affected by LoF variants, categorized by the genes’ LOEUF deciles. Genes that are highly intolerant to functional variation, as indicated by lower LOEUF deciles, have fewer affected transcripts compared to genes that are more tolerant. Error bars are included to indicate SEs. (D) The histogram of normalized total bilirubin levels among individuals in the JEWEL cohort. The red line highlights an individual with compound heterozygous LoF variants in the ABCC2 gene, ranking third in the whole JEWEL dataset. This elevated level of total bilirubin is consistent with the clinical phenotype of Dubin-Johnson syndrome, which is caused by the inactivation of the ABCC2 gene. (E) The plot presents data on six individuals carrying LoF variants in the PTPRD gene. The identifier for each LoF variant, either rsID or variant ID, is displayed at the top. Blue boxes represent exons for different transcripts, while the red lines mark the locations of these LoFs. Individual IDs carrying the LoF variants are indicated at the bottom. A zoomed-out perspective of the plot is presented in fig. S9. (F) The shared phenotypes among three PTPRD LoF carriers for whom comprehensive clinical data are available (S1, S5, and S6), with the names of the phenotypes provided for reference. JEWEL allowed us to further assess the potential applicability of LoF observed/expected upper-bound fraction (LOEUF) scores in the Japanese population. The LOEUF score was introduced as a metric to quantify a gene’s tolerance to LoF variants, based on observed and expected counts of LoF variants in the gnomAD project (36). Given that individuals with EA ancestry constituted 7% of the gnomAD dataset, we are interested in testing whether LOEUF score is applicable to JEWEL. We observed that genes in the lowest LOEUF decile bin (indicating the highest intolerance to LoF variants) were least affected by LoFs (fig. S8). This supports the utility of LOEUF scores in stratifying genes highly intolerant to LoF variants. However, a discrepancy was found in the number of genes affected by LoF variants in top decile bins (fig. S8). Furthermore, we observed that the fraction of transcripts affected by LoF variants showed a significant positive correlation with LOEUF bins (Fig. 2C). Overall, these results support the generalizability of LOEUF score while acknowledging that there might be room for improvement in relation to LoF-tolerant genes. Pathogenic variants and human knockouts are highly valuable for clinical research and drug development and may reveal human genotype-phenotype connections. We identified 371 ClinVar-registered pathogenic variants and 1723 unreported LoF variants in genes harboring pathogenic variants in ClinVar (note S5). We searched for human knockouts, defined as homozygotes or compound heterozygotes for LoF variants. Inspection of annotations and manual curation identified 23 human knockouts that are likely to be clinically relevant. We noted a carrier of compound heterozygous LoF variants in the ABCC2 gene (see Materials and Methods and table S16). The LoF of this gene is known to cause Dubin-Johnson syndrome, an autosomal recessive liver disease related to hyperbilirubinemia (55, 56). The syndrome is typically benign, and patients exhibit an increase in total bilirubin in the blood, leading to chronic jaundice. We obtained clinical history records and blood test results for this individual and confirmed the diagnosis of Dubin-Johnson syndrome and the clinical manifestation of hyperbilirubinemia (Fig. 2D). Furthermore, two of three individuals with homozygous LoF variants in GJB2, a gene associated with nonsyndromic sensorineural hearing loss, were confirmed to have hearing loss (57). These examples demonstrate that we can use JEWEL to identify likely underlying pathogenic variants responsible for diseases and to mine potentially clinically relevant genotype-phenotype connections. In addition to conventional human knockout analyses presented above, we leveraged rich phenotyping data in JEWEL to examine individuals with heterozygous LoF variants in genes considered highly intolerant to LoF variants, as indicated by LOEUF scores. Focusing on genes that have multiple LoF variants, we identified six individuals with LoF variants in PTPRD, one of the top-ranked LOEUF genes (LOEUF = 0.11, rank = 271 among 19,704 genes), which encodes a receptor-like protein tyrosine phosphatase (Fig. 2E) (58). Detailed clinical information was obtained for three of the six individuals, who presented with several shared phenotypes, including myocardial infarction, kidney failure, hypertension, and drug eruption (Fig. 2F and table S17). The PTPRD gene has 13 transcripts with most exons being identical and shared among multiple transcripts. However, only two transcripts were affected by LoF variants, which is significantly fewer than would be expected by chance (P = 0.005, permutation test; see Materials and Methods, Fig. 2E, and fig. S9). We searched the literature for reported human knockout of PTPRD. A case report described a child carrying homozygous microdeletion of PTPRD, which was suspected to be associated with intellectual disability, trigonocephaly, and hearing loss (59). In addition, Ptprd knockout mice exhibit preweaning lethality with an incomplete penetrance (60). Given these data and the low LOEUF score, disruption of PTPRD protein might be highly deleterious. However, if LoFs affect only a limited number of transcripts or if the affected transcripts are of lesser functional importance, then the consequences might be more tolerable. Further genome-wide scanning identified additional genes where LoF variants occurred in a restricted set of transcripts, including two more PTPR family genes, both of which are in the lowest LOEUF bin, PTPRS (LOEUF = 0.25, P = 0.002) and PTPRM (LOEUF = 0.23, P = 0.009) (table S18). The results suggest that phenotypic impacts of certain LoFs may be mitigated, even in genes that are generally intolerant to LoF. However, other factors such as nonrandom sampling or inaccurate annotation of LoF transcripts should also be considered. Further studies using WGS from either the Japanese population or other populations are needed. Seen as examples above, we highlight the necessity to integrate genetic information with in-depth clinical data to understand the full spectrum of gene functions when potentially disrupted by LoF. These findings also suggest that tolerability to LoF should be evaluated not only at the gene level but also at the transcript level.
|
|
|
Post by Admin on May 1, 2024 3:39:19 GMT
Sequences introgressed from Neanderthals and Denisovans EAs carry introgressed sequences from Denisovans and Neanderthals (61–63). However, the surveys of introgression have so far been restricted to a small number of samples in East Asia. To detect sequences likely introgressed from Neanderthals or Denisovans, we applied a recently developed probabilistic method, IBDmix, which does not use a modern reference population (see Materials and Methods). On an individual basis, the individual in JEWEL carries ~49 Mb of Neanderthal-derived sequences and 1.47 Mb of Denisovan-derived sequences (table S19). In total, we identified 3079 segments likely introgressed from Neanderthals and 210 segments likely introgressed from Denisovans, covering 772 and 31.46 Mb of the genome, respectively (Fig. 3A). Our results replicated 85% (2414 of 2843) of previously reported Neanderthal-introgressed segments based on the analysis of 104 Japanese in the 1000 Genomes project (1KGP) (fig. S10) (63). Notably, 47% (1439 of 3079) of Neanderthal-introgressed regions were not identified by the 1KGP Japanese in Tokyo, Japan (JPT) dataset, and 77% (1113 of 1439) of them were rare, with frequencies less than 5%. PCA of introgressed Neanderthal segments in JEWEL revealed no subregional differences (fig. S11). We compared Denisovan introgression in JEWEL to that in populations from the 1KGP dataset, as well as in Papuans and Philippine Ayta, both of which have a high proportion of Denisovan ancestry (62, 64). The analysis revealed that the Denisovan-like segments in JEWEL significantly overlap with those in EA populations, while no statistical significance was found with those in Papuan and Philippine Ayta, indicating that Denisovan introgression in Japanese might be less relevant to that in Papuan and Philippine Ayta (table S20 and note S6). Fig. 3. Introgressed sequences from archaic Neanderthals or Denisovans in the Japanese population. (A) Density plot illustrating the distribution of introgressed sequences across each chromosome. The upper track, shown in blue, represents sequences likely introgressed from Neanderthals, while the lower track displays sequences originating from Denisovans. (B) Variants likely introgressed from Denisovans in the NKX6-1 locus are associated with T2D in the Japanese population. The triangle indicated the introgressed variants, and the gray dots indicated the nonintrogressed variants. (C) Introgressed variations from the Neanderthals in the F5 gene are associated with PT.
|
|
|
Post by Admin on May 2, 2024 20:25:06 GMT
Subsequently, we examined the phenotypic effects of the identified introgressed sequences on 106 traits based on GWAS summary statistics generated from BBJ (see Materials and Methods). We identified 44 archaic segments associated with 49 phenotypes (2 from Denisovans and 42 from Neanderthals). Among these, 43 associations have not been reported in comparison to a previous study (65). We validated 39 of 44 archaic segments by an alternative method SPrime and confirmed that 5 segments not detected by SPrime showed a high matching rate with the Neanderthal genome (see Materials and Methods) (62). The Denisovan-inherited segment at POLR3E was associated with height. The segment at NKX6-1 was associated with type 2 diabetes (T2D) (Fig. 3B and Table 1). The NKX6-1 segment has also been identified in other populations, including Papuans, Chinese [Han Chinese in Beijing (CHB) and Han Chinese South (CHS)], and Finnish (62). Moreover, archaic variants in this segment were found to be associated with T2D using GWAS data obtained from the FinnGen project (Pmin = 8.65 × 10−10 at rs75560957) (14). For Neanderthal-derived segments, we observed 11 segments associated with seven diseases—T2D, coronary artery disease (CAD), stable angina pectoris (SAP), atopic dermatitis (AD), Graves’ disease (GD), prostate cancer (PrCa), and rheumatoid arthritis (RA) (Table 1). A pathway analysis identified “regulation of insulin secretion” as the top associated pathway (P = 1.9 × 10−4). At the ADAMTS7 locus, the lead introgressed single-nucleotide polymorphism (SNP), rs11639375, was reported to be protective against CAD and SAP. While this SNP is observed in all major populations with high frequencies, upon further examination, it appears that rs11639375 in Japanese resides within a haplotype that is likely to have been introgressed from Neanderthals. The haplotype comprises 39 potentially archaic variants that exhibit a strong linkage disequilibrium (LD) with rs11639375 (r2 > 0.7). These variants are exclusive to EA and Latino Americans and are either absent or present at extremely low frequencies in other population groups (table S21). These data may suggest that this protective variant rs11639375 was once lost to EA and later restored through introgression. However, further analysis is needed to substantiate this hypothesis (note S7). We observed that a causal variant for AD, rs12637953, located in the CCDC80 locus, is likely to have been inherited from Neanderthals. This variant was implicated as potentially functional via decreasing expression levels of an enhancer in CD1a+ Langerhans cells and skin epidermis cells by machine learning in silico prediction and was further experimentally validated (66, 67). The introgressed segment at the GLP1R locus deserves attention. Variants at this locus were shown to be associated with T2D in a large-scale Japanese GWAS (n = 191,764), but not in European GWAS (N = 159,208), as previously reported (68). Through our analysis, we identified that the lead variants likely have archaic origins, specifically from Neanderthals. Further analyses using 1KGP data showed that this introgressed segment is present in Asians but absent in Europeans, which could account for the discrepancies in GWAS signals. In addition to archaic segments associated with diseases, we identified 37 distinct segments associated with 35 quantitative traits (table S22). As an example, archaic variants of the coagulation factor V (F5) gene showed positive associations with the bleeding trait (PT) (Fig. 3C). Notably, the same segment is associated with PT in the Icelandic population (69). We also confirmed that the Neanderthal-derived segment reported to be associated with severe COVID-19 (chr3: 45,859,651 to 45,909,024) was not detected in JEWEL (70). Last, the significant introgressed variants exhibited distinct population specificity in EAs compared to Europeans (fig. S12). The AFs were significantly higher in JEWEL compared to Europeans (P = 4.66 × 10−8, paired t test), and the median AF in the Japanese population is 21.5 times that of the AF in the European population.
Introgressed segment Lead archaic SNP Reported P Disease Beta Origin Gene chr4: 85200961–85426528 4:85301870:T:C 4.91 × 10−11 T2D −0.134 Denisovan NKX6-1 chr1: 39932346–40124123 1:39981740:G:A 3.16 × 10−8 T2D 0.062 Neanderthal BMP8A chr1: 160151058–160608637 1:160419940:A:G 3.29 × 10−13 GD 0.470 Neanderthal VANGL2 chr2: 164906091–165538059 2:165381518:A:G 6.69 × 10−10 T2D −0.172 Neanderthal GRB14 chr2: 173140874–173598206 2:173321791:T:G 5.07 × 10−12 PrCa −0.175 Neanderthal ITGA6 chr3: 23163800–23502216 3:23210938:C:G 3.33 × 10−15 T2D 0.100 Neanderthal UBE2E2 chr3: 111531421–113933832 3:112394029:T:C 2.88 × 10−14 AD 1.248 Neanderthal CCDC80 chr6: 38249704–39053462 6:39037662:G:C 1.09 × 10−17 T2D −0.092 Neanderthal GLP1R chr10: 63625277–64526183 10:64063077:T:C 1.26 × 10−8 RA 0.212 Neanderthal ZNF365 chr12: 31070734–32216996 12:31441179:A:C 4.14 × 10−25 T2D 0.112 Neanderthal FAM60A chr15: 78635757–79216385 15:79019990:C:T 3.79 × 10−10 SAP −0.078 Neanderthal ADAMTS7 chr15: 78635757–79216385 15:79026723:G:A 2.90 × 10−15 CAD −0.079 Neanderthal ADAMTS7 Table 1. Introgressed segments associated with disease phenotypes in the Japanese population.
|
|