Post by Admin on Jul 3, 2021 4:15:22 GMT
Results
Genome wide scans for introgression
We carried out two genome-wide searches for introgressed loci in a European population for which we also had available eQTL data using Sprime (Browning et al. 2018) and U and Q95 (Racimo et al. 2014) methods. We used 423 Estonian whole genome samples (Pankratov et al. 2020) that constitute a well-studied representative sample of the broader Estonian population as sampled by the Estonian Biobank (EGCUT) (Leitsalu et al 2015). These samples also have available whole blood RNA-sequence data which contributed to eQTLGen, a broad whole blood eQTL analysis study (Vosa et al. 2018). By utilizing genomes that were part of the eQTL study population, we can be assured that the associations between alleles and gene expression is accurate, as differential linkage disequilibrium (LD) between alleles in different populations can decrease the efficacy of using eQTL data from one population on another.
We initially conducted the Sprime scan (Browning et al. 2018) using the 423 Estonians as the ingroup population along with 36 African samples from the Simons Genome Diversity Project (SGDP) with no evidence of European admixture (Mallick et al. 2016) as an outgroup (Table S1). From this scan, we identified 175,550 likely archaically introgressed alleles across 1,678 segments (Table S2). Following Browning et al. (2018), we then identified segments as confidently introgressed from Neanderthals if they had at least 30 putatively archaically introgressed alleles with a match rate to the Vindija Neanderthal genome (Prüfer et al. 2017) greater than 0.6 and a match rate with the Denisovan genome (Meyer et al. 2012) less than 0.4 (Browning et al. 2018). In total, we identified 693 such segments (Table S3), including the segment containing the COVID-19 severity haplotype on chromosome 3 (see above).
We next used the U and Q95 scan, which specifically identifies regions of introgression showing evidence of positive selection (Racimo et al. 2017). Using Africans from the 1000 genomes project as an outgroup (1000 Genomes Project Consortium, 2015), we found 493 such regions (Table S4). We did not detect the introgressed COVID-19 severity haplotype in our population via this method. This suggests that the COVID-19 severity associated segment, while likely introgressed from Neanderthals based on its detection in our Sprime scan and via the work of others (Zeberg and Paabo 2020), was not under positive selection in the Estonian population. This is consistent with the previously reported lower frequency (8%) of the haplotype in Europeans relative to South Asian populations in which the haplotype is at higher frequency (30%) (Zeberg and Paabo 2020). However, U and Q95 scans do detect this region in South Asian populations (Racimo et al. 2017; Jagoda et al. 2018), supporting positive selection on this haplotype in South Asian, but not European populations.
We next examined which alleles in these putatively Neanderthal introgressed regions detected using these two genome wide scans also are cis- and trans-eQTLs in the eQTLGen whole blood dataset (Vosa et al. 2018). From the U and Q95 data, we identified 684 cis-eQTLs across 250 40kb windows (Table S5). There were no trans-eQTLs detected in this set. From the Sprime data, we found 27,342 cis-eQTLs from 318 segments along with four trans-eQTLs from three segments (Table S6 and S7).
Refinement of the severe COVID-19 associated introgressed segment
In our Sprime scan, we identified an introgressed region containing the haplotype defined by Zeberg and Paabo (2020) as both introgressed and associated with increased risk of COVID-19-severity. The overall introgressed region as detected in our Estonian population spans ∼811kb from chr3:45843242-46654616, encompasses 16 genes (Figure 1A), and ranks in the top 2% (ranked 21/1677) of Sprime detected segments and in the top 5% (58/1677) of Sprime segments based on length. Its extreme length provides support to the fact that it is introgressed and not likely a product of incomplete lineage sorting, which is detected as seemingly introgressed tracts of significantly shorter length (Huerta-Sanchez et al. 2014).
Figure 1.
Computational intersections between MPRA emVars and functional genomics datasets across the severe COVIDB19 risk locus.
(a) Gene locations across the locus along with boundaries of the four LD blocks (ACD). (b) Severe COVIDC19 GWAS effect sizes from the release 5 of the COVIDC19 Host Initiative dataset (2021), with strongest genomeCwide pCvalues in yellow spanning the A and B blocks. See key for other color definitions. Dots and diamonds across the panels indicate respectively SNPs identified by Sprime and SNPs in LD with them. (c) eQTL effect sizes across the locus (blue for CCR1, green for CCR5) in whole blood from eQTLGen (Vosa et al. 2018) across the locus. Note the strong downC versus upCregulation of CCR1 for variants in the A versus B blocks, respectively. Grey SNPs are not eQTLs for any of the two genes or were absent from the eQTL study. Asterisks denote transCeQTLs.(d) ChromatinCbased functional annotations across the locus consisting of HiCC contacts with CCR1 and CCR5 in Spleen, Thymus or LCL (Jung et al. 2019) and candidate cisCregulatory elements from ENCODE (ENCODE Project Consortium et al. 2020). (e) CLog pCvalues for emVars identified using MPRA across the locus. Grey SNPs failed the test for activity in either the archaic or nonCarchaic form. Vertical lines connect the four putative causal emVars and the original tag SNP as described by Zeberg and Paabo (2020) rs35044562 to functional genomics and genetics data. The four putatively causal variants are unique in having significant hits across all functional genomics and genetics tests.
To examine how the introgressed segment may be affecting COVID-19 severity, we began by examining the LD structure within the segment and identified four major blocks defined as minimum pairwise LD between Sprime-identified variants within a block (min r2 = 0.34) (SFig 1). We labeled these blocks as “A” from rs13071258 to rs13068572 (chr3:45843242-46177096), “B” from rs17282391 to rs149588566 (chr3:46179481-46289403), “C” rs71327065 to rs79556692 (chr3:46483630-46585769), and “D” from rs73069984 to rs73075571 (chr3:46593568-46649711) (Figure 1A; SFig 1). All Sprime alleles in the A block are significantly (p < 5*10-8) associated with increased risk for COVID-19 severity (The COVID-19 Host Initiative et al. 2021), with the median p-value being 2.32 * 10-26 and median effect size being 0.42 (Figure 1B). The B block also harbors many alleles (81.2%) significantly associated with COVID-19 severity, with the median p-value being 1.94*10-9 and median effect size being 0.28 (Figure 1B). In the “C” and “D” block no alleles are significantly associated with COVID-19 severity, suggesting that the most likely causal variants for the COVID-19 severity association are found within the A or B blocks (Figure 1B).
All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood (Vosa et al. 2018) including for many genes that are relevant to COVID-19 infection. Strikingly, of the four trans-eQTLs identified genome-wide in our Sprime scan regions in Estonians, two were located on the introgressed COVID-19 severity haplotype. These two variants, rs13063635 and rs13098911, have 11 and 33 response genes, respectively (Table S7). We examined whether these response genes have any relevance to COVID-19 infection and found that 3 (27%) and 13 (39%) of the response genes for rs13063635 and rs13098911, respectively, are differentially expressed in at least one experiment in which a lung related cell-line or tissue was infected with COVID-19 or other related infections (Table S8). These results suggest that these two trans-eQTLs may affect the lung response to COVID-19 in a way that could contribute to differential severity in host response. Furthermore, all 361 variants, including the two trans-eQTLs, act as cis-eQTLs in whole blood, altering the expression of 14 response genes: CCR1, CCR2, CCR3, CCR5, CCR9, CCRL2, CXCR6, FLT1P1, LRRC2, LZTFL1, RP11-24F11.2, SACM1L, SCAP, TMIE. Of these genes, 7 are chemokine receptor genes (CCR1, CCR2, CCR3, CCR5, CCR9, CCRL2, CXCR6), which are likely linked to the segment’s association with COVID-19 severity. There is support in particular for an association between CCR1 and CCR5 expression and COVID-19 phenotypes. For example, elevated CCR1 expression has been demonstrated in neutrophils and macrophages from patients with critical COVID-19 illness (Chua et al. 2020), in biopsied lung tissues from COVID-19 infected patients (Table S8), as well as in Calu3 cells directly infected with COVID-19 (Table S8). CCR5 expression is also elevated, though to a lesser degree than CCR1, in macrophages of patients with critical COVID-19 illness (Chua et al.). Notably, some ligands for CCR1 and CCR5 (CCL15, CCL2 and CCL3) also show over-expression in these patients (Chua et al.). CCRL2, LZTFL1, SCAP, and SACM1L are also differentially expressed in at least one experiment that measures differential expression of genes in lung tissues and related cell lines infected with COVID-19 or other viruses that stimulate similar immune responses (Table S8). These results in general support a role for the introgressed variants along this haplotype contributing to the COVID-19 severe phenotype by affecting the expression of genes that play a role in host COVID-19 response.
Intriguingly, when considering the effect of the cis-eQTLs for CCR1 across the entire segment, we find that the majority of alleles along the introgressed haplotype within the A block are associated with its down-regulation (average Z score = −12.3) (Figure 1C). On the other hand, the majority of alleles within the B and C blocks are associated with CCR1 up-regulation (average Z scores = 7.1 and 10.2, respectively) (Figure 1C). It is important to note that these eQTL effects are determined based on whole blood from non-infected, healthy patients (Vosa et al. 2018). When considered in the context that severe COVID-19 phenotype is characterized by increased expression of CCR1 (Chua et al. 2020), these risk-associated alleles having different directions of effect suggest that a complex change to the CCR1 regulatory landscape driven by alleles across the introgressed segment may be contributing to the disease phenotype. When we consider CCR5 expression, it shows a more consistent pattern in which the majority of alleles within the A-C blocks are associated with its down-regulation. This result is interesting as CCR5 expression in patients with severe COVID-19 illness is higher than those with more moderate cases (Chua et al.). However, given the strong LD within each of these segments, discerning the direct connection between one or more alleles driving these regulatory changes and the molecular and phenotypic signatures of severe COVID-19 remains difficult.
Genome wide scans for introgression
We carried out two genome-wide searches for introgressed loci in a European population for which we also had available eQTL data using Sprime (Browning et al. 2018) and U and Q95 (Racimo et al. 2014) methods. We used 423 Estonian whole genome samples (Pankratov et al. 2020) that constitute a well-studied representative sample of the broader Estonian population as sampled by the Estonian Biobank (EGCUT) (Leitsalu et al 2015). These samples also have available whole blood RNA-sequence data which contributed to eQTLGen, a broad whole blood eQTL analysis study (Vosa et al. 2018). By utilizing genomes that were part of the eQTL study population, we can be assured that the associations between alleles and gene expression is accurate, as differential linkage disequilibrium (LD) between alleles in different populations can decrease the efficacy of using eQTL data from one population on another.
We initially conducted the Sprime scan (Browning et al. 2018) using the 423 Estonians as the ingroup population along with 36 African samples from the Simons Genome Diversity Project (SGDP) with no evidence of European admixture (Mallick et al. 2016) as an outgroup (Table S1). From this scan, we identified 175,550 likely archaically introgressed alleles across 1,678 segments (Table S2). Following Browning et al. (2018), we then identified segments as confidently introgressed from Neanderthals if they had at least 30 putatively archaically introgressed alleles with a match rate to the Vindija Neanderthal genome (Prüfer et al. 2017) greater than 0.6 and a match rate with the Denisovan genome (Meyer et al. 2012) less than 0.4 (Browning et al. 2018). In total, we identified 693 such segments (Table S3), including the segment containing the COVID-19 severity haplotype on chromosome 3 (see above).
We next used the U and Q95 scan, which specifically identifies regions of introgression showing evidence of positive selection (Racimo et al. 2017). Using Africans from the 1000 genomes project as an outgroup (1000 Genomes Project Consortium, 2015), we found 493 such regions (Table S4). We did not detect the introgressed COVID-19 severity haplotype in our population via this method. This suggests that the COVID-19 severity associated segment, while likely introgressed from Neanderthals based on its detection in our Sprime scan and via the work of others (Zeberg and Paabo 2020), was not under positive selection in the Estonian population. This is consistent with the previously reported lower frequency (8%) of the haplotype in Europeans relative to South Asian populations in which the haplotype is at higher frequency (30%) (Zeberg and Paabo 2020). However, U and Q95 scans do detect this region in South Asian populations (Racimo et al. 2017; Jagoda et al. 2018), supporting positive selection on this haplotype in South Asian, but not European populations.
We next examined which alleles in these putatively Neanderthal introgressed regions detected using these two genome wide scans also are cis- and trans-eQTLs in the eQTLGen whole blood dataset (Vosa et al. 2018). From the U and Q95 data, we identified 684 cis-eQTLs across 250 40kb windows (Table S5). There were no trans-eQTLs detected in this set. From the Sprime data, we found 27,342 cis-eQTLs from 318 segments along with four trans-eQTLs from three segments (Table S6 and S7).
Refinement of the severe COVID-19 associated introgressed segment
In our Sprime scan, we identified an introgressed region containing the haplotype defined by Zeberg and Paabo (2020) as both introgressed and associated with increased risk of COVID-19-severity. The overall introgressed region as detected in our Estonian population spans ∼811kb from chr3:45843242-46654616, encompasses 16 genes (Figure 1A), and ranks in the top 2% (ranked 21/1677) of Sprime detected segments and in the top 5% (58/1677) of Sprime segments based on length. Its extreme length provides support to the fact that it is introgressed and not likely a product of incomplete lineage sorting, which is detected as seemingly introgressed tracts of significantly shorter length (Huerta-Sanchez et al. 2014).
Figure 1.
Computational intersections between MPRA emVars and functional genomics datasets across the severe COVIDB19 risk locus.
(a) Gene locations across the locus along with boundaries of the four LD blocks (ACD). (b) Severe COVIDC19 GWAS effect sizes from the release 5 of the COVIDC19 Host Initiative dataset (2021), with strongest genomeCwide pCvalues in yellow spanning the A and B blocks. See key for other color definitions. Dots and diamonds across the panels indicate respectively SNPs identified by Sprime and SNPs in LD with them. (c) eQTL effect sizes across the locus (blue for CCR1, green for CCR5) in whole blood from eQTLGen (Vosa et al. 2018) across the locus. Note the strong downC versus upCregulation of CCR1 for variants in the A versus B blocks, respectively. Grey SNPs are not eQTLs for any of the two genes or were absent from the eQTL study. Asterisks denote transCeQTLs.(d) ChromatinCbased functional annotations across the locus consisting of HiCC contacts with CCR1 and CCR5 in Spleen, Thymus or LCL (Jung et al. 2019) and candidate cisCregulatory elements from ENCODE (ENCODE Project Consortium et al. 2020). (e) CLog pCvalues for emVars identified using MPRA across the locus. Grey SNPs failed the test for activity in either the archaic or nonCarchaic form. Vertical lines connect the four putative causal emVars and the original tag SNP as described by Zeberg and Paabo (2020) rs35044562 to functional genomics and genetics data. The four putatively causal variants are unique in having significant hits across all functional genomics and genetics tests.
To examine how the introgressed segment may be affecting COVID-19 severity, we began by examining the LD structure within the segment and identified four major blocks defined as minimum pairwise LD between Sprime-identified variants within a block (min r2 = 0.34) (SFig 1). We labeled these blocks as “A” from rs13071258 to rs13068572 (chr3:45843242-46177096), “B” from rs17282391 to rs149588566 (chr3:46179481-46289403), “C” rs71327065 to rs79556692 (chr3:46483630-46585769), and “D” from rs73069984 to rs73075571 (chr3:46593568-46649711) (Figure 1A; SFig 1). All Sprime alleles in the A block are significantly (p < 5*10-8) associated with increased risk for COVID-19 severity (The COVID-19 Host Initiative et al. 2021), with the median p-value being 2.32 * 10-26 and median effect size being 0.42 (Figure 1B). The B block also harbors many alleles (81.2%) significantly associated with COVID-19 severity, with the median p-value being 1.94*10-9 and median effect size being 0.28 (Figure 1B). In the “C” and “D” block no alleles are significantly associated with COVID-19 severity, suggesting that the most likely causal variants for the COVID-19 severity association are found within the A or B blocks (Figure 1B).
All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood (Vosa et al. 2018) including for many genes that are relevant to COVID-19 infection. Strikingly, of the four trans-eQTLs identified genome-wide in our Sprime scan regions in Estonians, two were located on the introgressed COVID-19 severity haplotype. These two variants, rs13063635 and rs13098911, have 11 and 33 response genes, respectively (Table S7). We examined whether these response genes have any relevance to COVID-19 infection and found that 3 (27%) and 13 (39%) of the response genes for rs13063635 and rs13098911, respectively, are differentially expressed in at least one experiment in which a lung related cell-line or tissue was infected with COVID-19 or other related infections (Table S8). These results suggest that these two trans-eQTLs may affect the lung response to COVID-19 in a way that could contribute to differential severity in host response. Furthermore, all 361 variants, including the two trans-eQTLs, act as cis-eQTLs in whole blood, altering the expression of 14 response genes: CCR1, CCR2, CCR3, CCR5, CCR9, CCRL2, CXCR6, FLT1P1, LRRC2, LZTFL1, RP11-24F11.2, SACM1L, SCAP, TMIE. Of these genes, 7 are chemokine receptor genes (CCR1, CCR2, CCR3, CCR5, CCR9, CCRL2, CXCR6), which are likely linked to the segment’s association with COVID-19 severity. There is support in particular for an association between CCR1 and CCR5 expression and COVID-19 phenotypes. For example, elevated CCR1 expression has been demonstrated in neutrophils and macrophages from patients with critical COVID-19 illness (Chua et al. 2020), in biopsied lung tissues from COVID-19 infected patients (Table S8), as well as in Calu3 cells directly infected with COVID-19 (Table S8). CCR5 expression is also elevated, though to a lesser degree than CCR1, in macrophages of patients with critical COVID-19 illness (Chua et al.). Notably, some ligands for CCR1 and CCR5 (CCL15, CCL2 and CCL3) also show over-expression in these patients (Chua et al.). CCRL2, LZTFL1, SCAP, and SACM1L are also differentially expressed in at least one experiment that measures differential expression of genes in lung tissues and related cell lines infected with COVID-19 or other viruses that stimulate similar immune responses (Table S8). These results in general support a role for the introgressed variants along this haplotype contributing to the COVID-19 severe phenotype by affecting the expression of genes that play a role in host COVID-19 response.
Intriguingly, when considering the effect of the cis-eQTLs for CCR1 across the entire segment, we find that the majority of alleles along the introgressed haplotype within the A block are associated with its down-regulation (average Z score = −12.3) (Figure 1C). On the other hand, the majority of alleles within the B and C blocks are associated with CCR1 up-regulation (average Z scores = 7.1 and 10.2, respectively) (Figure 1C). It is important to note that these eQTL effects are determined based on whole blood from non-infected, healthy patients (Vosa et al. 2018). When considered in the context that severe COVID-19 phenotype is characterized by increased expression of CCR1 (Chua et al. 2020), these risk-associated alleles having different directions of effect suggest that a complex change to the CCR1 regulatory landscape driven by alleles across the introgressed segment may be contributing to the disease phenotype. When we consider CCR5 expression, it shows a more consistent pattern in which the majority of alleles within the A-C blocks are associated with its down-regulation. This result is interesting as CCR5 expression in patients with severe COVID-19 illness is higher than those with more moderate cases (Chua et al.). However, given the strong LD within each of these segments, discerning the direct connection between one or more alleles driving these regulatory changes and the molecular and phenotypic signatures of severe COVID-19 remains difficult.