|
Post by Admin on Jul 5, 2021 2:43:17 GMT
An international team of researchers co-led by the University of Adelaide and the University of Arizona has analyzed the genomes of more than 2,500 modern humans from 26 worldwide populations, to better understand how humans have adapted to historical coronavirus outbreaks. In a paper published in Current Biology, the researchers used cutting-edge computational methods to uncover genetic traces of adaptation to coronaviruses, the family of viruses responsible for three major outbreaks in the last 20 years, including the ongoing pandemic. “Modern human genomes contain evolutionary information tracing back hundreds of thousands of years, however, it’s only in the past few decades geneticists have learned how to decode the extensive information captured within our genomes,” said lead author Dr. Yassine Souilmi, with the University of Adelaide’s School of Biological Sciences. “This includes physiological and immunological ‘adaptions’ that have enabled humans to survive new threats, including viruses. “Viruses are very simple creatures with the sole objective to make more copies of themselves. Their simple biological structure renders them incapable of reproducing by themselves so they must invade the cells of other organisms and hijack their molecular machinery to exist.” Viral invasions involve attaching and interacting with specific proteins produced by the host cell known as viral interacting proteins (VIPs). In the study, researchers found signs of adaptation in 42 different human genes encoding VIPs. “We found VIP signals in five populations from East Asia and suggest the ancestors of modern East Asians were first exposed to coronaviruses over 20,000 years ago,” said Dr. Souilmi. “We found the 42 VIPs are primarily active in the lungs — the tissue most affected by coronaviruses — and confirmed that they interact directly with the virus underlying the current pandemic.” Other independent studies have shown that mutations in VIP genes may mediate coronavirus susceptibility and also the severity of COVID-19 symptoms. And several VIPs are either currently being used in drugs for COVID-19 treatments or are part of clinical trials for further drug development. “Our past interactions with viruses have left telltale genetic signals that we can leverage to identify genes influencing infection and disease in modern populations, and can inform drug repurposing efforts and the development of new treatments,” said co-author Dr. Ray Tobler, from the University of Adelaide’s School of Biological Sciences. “By uncovering the genes previously impacted by historical viral outbreaks, our study points to the promise of evolutionary genetic analyses as a new tool in fighting the outbreaks of the future,” said Dr. Souilmi. The researchers also note that their results in no way supersede pre-existing public health policies and protections, such as mask-wearing, social distancing, and vaccinations. Reference: “An ancient viral epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia” by Yassine Souilmi, M. Elise Lauterbur, Ray Tobler, Christian D. Huber, Angad S. Johar, Shayli Varasteh Moradi, Wayne A. Johnston, Nevan J. Krogan, Kirill Alexandrov and David Enard, 24 June 2021, Current Biology. DOI: 10.1016/j.cub.2021.05.067
|
|
|
Post by Admin on Jul 5, 2021 20:46:33 GMT
An ancient viral epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia Open Access Published:June 24, 2021 DOI:https://doi.org/10.1016/j.cub.2021.05.067 Summary The current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has emphasized the vulnerability of human populations to novel viral pressures, despite the vast array of epidemiological and biomedical tools now available. Notably, modern human genomes contain evolutionary information tracing back tens of thousands of years, which may help identify the viruses that have impacted our ancestors—pointing to which viruses have future pandemic potential. Here, we apply evolutionary analyses to human genomic datasets to recover selection events involving tens of human genes that interact with coronaviruses, including SARS-CoV-2, that likely started more than 20,000 years ago. These adaptive events were limited to the population ancestral to East Asian populations. Multiple lines of functional evidence support an ancient viral selective pressure, and East Asia is the geographical origin of several modern coronavirus epidemics. An arms race with an ancient coronavirus, or with a different virus that happened to use similar interactions as coronaviruses with human hosts, may thus have taken place in ancestral East Asian populations. By learning more about our ancient viral foes, our study highlights the promise of evolutionary information to better predict the pandemics of the future. Importantly, adaptation to ancient viral epidemics in specific human populations does not necessarily imply any difference in genetic susceptibility between different human populations, and the current evidence points toward an overwhelming impact of socioeconomic factors in the case of coronavirus disease 2019 (COVID-19). Introduction Coronaviruses have been behind three major zoonotic outbreaks.1 The first outbreak, known as SARS-CoV (severe acute respiratory syndrome coronavirus), originated in China in 2002 and infected more than 8,000 and killed more than 800 people.2 Four years later, MERS-CoV (Middle East respiratory syndrome coronavirus) affected >2,400 and killed over 850 people (https://www.who.int). The most recent outbreak began in late 2019 when SARS-CoV-2 emerged in China, triggering an ongoing pandemic (coronavirus disease 2019 [COVID-19]).3 The research on SARS-CoV-2 epidemiology has revealed that socioeconomic (e.g., access to healthcare, testing, and exposure at work), demographic, and personal health factors all play a major role in SARS-CoV-2 epidemiology.4, 5, 6 Additionally, several genetic loci that mediate SARS-CoV-2 susceptibility and severity have been found in contemporary European populations,7, 8, 9, 10 one of which contains a genetic variant that increases SARS-CoV-2 susceptibility that likely increased in frequency in the ancestors of modern Europeans after interbreeding with Neanderthals.11 Throughout the evolutionary history of our species, positive natural selection has frequently targeted proteins that physically interact with viruses—e.g., those involved in immunity or used by viruses to hijack the host cellular machinery.12, 13, 14 In the millions of years of human evolution, selection has led to the fixation of gene variants encoding virus-interacting proteins (VIPs) (Data S1A) at three times the rate observed for other classes of genes.13,15 Strong selection on VIPs has continued in human populations during the past 50,000 years, as evidenced by VIP genes being enriched for adaptive introgressed Neanderthal variants and also selective sweep signals (i.e., selection that drives a beneficial variant to substantial frequencies in a population), particularly around VIPs that interact with RNA viruses (Data S1B), a viral class that includes the coronaviruses.16,17 The accumulated evidence suggests that ancient RNA virus epidemics have occurred frequently during human evolution; however, we currently do not know whether selection has made a substantial contribution to the evolution of human genes that interact more specifically with coronaviruses. Accordingly, here, we investigate whether ancient coronavirus epidemics have driven past adaptation in modern human populations, by examining whether selection signals are enriched within a set of 420 VIPs that interact with coronaviruses (denoted CoV-VIPs; Data S1C) across 26 human populations from the 1000 Genomes Project.18 These CoV-VIPs comprise 332 SARS-CoV-2 VIPs identified by high-throughput mass spectrometry (Data S1D),19 and an additional 88 proteins that were manually curated from coronaviruses literature (e.g., SARS-CoV-1, MERS, HCoV-NL63, etc.; Data S1C)16 and are part of a larger set of 5,291 VIPs (STAR Methods; Data S1A) from multiple viruses.16 Our focus on VIPs is motivated by evidence indicating that these protein interactions are the central mechanism that viruses use to hijack the host cellular machinery.16,19 Accordingly, VIPs are much more likely to have functional impacts on viruses than other proteins (STAR Methods). An alternative that we cannot exclude however is that a different type of virus that happens to use similar VIPs as coronaviruses might have driven adaptation signals at CoV-VIPs. Our analyses find a strong enrichment in sweep signals at CoV-VIPs across multiple East Asian populations, which is absent from other populations. This suggests that an ancient coronavirus epidemic (or another virus using similar VIPs) drove an adaptive response in the ancestors of East Asians. Further, by leveraging ancestral recombination graph approaches,20,21 we find that 42 CoV-VIPs may have come under selection around 900 generations (∼25,000 years) ago and exhibit a coordinated adaptive response. We further show that the CoV-VIP genes are enriched for anti- and proviral effects and variants that affect COVID-19 etiology in the modern British population (https://grasp.nhlbi.nih.gov/Covid19GWASResults.aspx).22,23 We further show that the inferred underlying causal mutations are situated near to regulatory variants active in lungs and other tissues impacted by COVID-19. These independent lines of evidence support an ancient coronavirus (or a similarly interacting virus) epidemic that emerged in the ancestors of contemporary East Asian populations.
|
|
|
Post by Admin on Jul 6, 2021 5:41:24 GMT
Results Signatures of adaptation to an ancient epidemic Viruses have exerted strong selective pressures on modern humans.15,17 Accordingly, we use two statistical tests that are sensitive to such genetic signatures (i.e., selective sweeps)—nSL24 and iHS25—while being insensitive to background selection.26,27 After scanning each of the 26 populations for selection signals, we apply an enrichment test that was previously used to detect enriched selection signals in RNA VIPs in human populations.17 Briefly, for each population and selection statistic, we rank all genes based on the average selection statistic score observed in genomic windows ranging from 50 kb to 2 Mb (STAR Methods). Different window sizes are used because smaller windows tend to be more sensitive to weaker sweeps, whereas larger windows tend to be more sensitive to stronger sweeps (STAR Methods).17 After ranking the gene scores, we estimate an enrichment curve (Figure 1) for gene sets ranging from the top 10 to 10,000 ranked loci (STAR Methods). The significance of the whole enrichment curve is then calculated using a genome block-randomization approach that accounts for the genomic clustering of neighboring CoV-VIPs and provides an unbiased false-positive risk (FPR) for the whole enrichment curve28 by re-running the entire enrichment analysis pipeline on block-randomized genomes (STAR Methods).17 For our control gene set, we use protein-coding genes situated at least 500 kb from CoV-VIPs to avoid overlapping the same sweep signals. Additionally, genes in the control sets are chosen to have similar characteristics as the CoV-VIPs (e.g., similar recombination, density of coding sequences, etc.; see STAR Methods for the complete list of factors) to ensure that any detected enrichment is virus specific rather than due to a confounding factor.17 Finally, we also exclude the possibility that functions other than viral interactions might explain our results by running a Gene Ontology analysis (STAR Methods; Data S1E and S1F; Figures S1A and S1B).29 Figure 1 Coronavirus VIPs nSL ranks enrichment Applying this approach to each of the 26 populations from the 1000 Genomes Project dataset, we find a strong enrichment of sweep signals in CoV-VIPs that is specific to the five East Asian populations (whole enrichment curve for nSL and iHS combined FPR = 2.10−4; Figures 1 and S2A–S2N; STAR Methods). No enrichment is observed for populations from other continents, including in neighboring South Asia (whole enrichment curve for nSL and iHS combined FPR > 0.05 in all cases; Figures 1 and S2F–S2I). Further, no enrichment is detected for VIP sets for 17 other viruses in East Asian populations (whole enrichment curve for nSL and iHS separately or combined; p > 0.05 in all cases; Figures S3 and S4). Taken together, these results suggest that coronaviruses (or a virus interacting similarly with hosts) have driven ancient epidemics in East Asia. This enrichment is unlikely to have been caused by any other virus represented in our set of 5,291 VIPs (Data S1A), but we still cannot exclude that a currently unknown type of virus that happened to use similar VIPs as coronaviruses could have been involved instead. The enrichment is most substantial for the top-ranked gene sets ranging between the top 10 and top 1,000 loci (Figure 1; whole enrichment curve FPR = 3.10−6 for nSL, FPR = 4.10−3 for iHS, and FPR = 6.10−5 for iHS and nSL combined) and is particularly strong for the top 200 loci in large windows (1 Mb) where a 4-fold enrichment is observed for both nSL and iHS statistics (pertaining to between 10 and 13 selected CoV-VIPs among the top 200 ranked genes; Data S1G). This suggests strong selection at multiple CoV-VIPs. That the selected haplotype structures are detected by both the iHS and nSL statistics suggests that they are unlikely to have occurred prior to 30,000 years ago, as both statistics have little power before this time point.30
|
|
|
Post by Admin on Jul 6, 2021 20:15:16 GMT
An ancient epidemic in the ancestors of East Asians starting more than 20,000 years ago To further test the existence of an ancient viral epidemic in East Asia, we use a recent ancestral recombination graph (ARG)-based method, Relate,20 to infer the timing and trajectories of selected loci for the CoV-VIPs. If the selective pressure responsible for the multiple independent selection events at CoV-VIPs was sudden, as expected from a new epidemic, these selection events should have started independently around the same time. By estimating ARGs at variants distributed across the entire genome, Relate can reconstruct coalescent events across time and detect genomic regions impacted by positive selection. To approximate the start time of selection, Relate estimates the first historical time point that a putatively selected variant had an observable frequency unlikely to be equal to zero (STAR Methods). We use this approximation as the likely starting time of selection (STAR Methods). Additionally, we use the iSAFE software31—which enables the localization of selected variants—along with a curated set of regulatory variants (expression quantitative trait loci [eQTLs]) from the GTEx Project32 to help identify the likely causal mutations in the selected CoV-VIP genes. There is good evidence that most adaptive mutations in the human genome are regulatory mutations.26,33, 34, 35 Accordingly, we find that iSAFE peaks are significantly closer to GTEx v8 eQTLs proximal to CoV-VIP genes than expected by chance (iSAFE proximity test; p < 10−9; STAR Methods). Therefore, for each CoV-VIP gene, we choose a variant with the lowest Relate p value (<10−3; STAR Methods) that is situated at or close to a GTEx eQTL associated with the focal gene to estimate the likely starting time of selection for that gene (STAR Methods; Figure S5A). Using this approach, we observe 42 CoV-VIPs (Data S1H; Figure S5A) with selection starting times clustered around 870 generations ago (∼200 generations wide, potentially due to noise in our estimates; Figure 2). While this amounts to about four times more selected CoV-VIP genes than were detected using either nSL or iHS (both detected around ten CoV-VIPs among the top 200 ranked genes; Data S1G), this is not unexpected, as Relate has more power to detect selection events than nSL and iHS when the beneficial allele is at intermediate frequencies (typically <60%; Figure 3; see Enard and Petrov,17 Ferrer-Admetlla et al.,24 and Voight et al.25). The tight clustering of starting times forms a highly significant peak (peak significance test p = 2.3.10−4; Figure 2) when comparing the observed clustering of CoV-VIPs start times with the distribution of inferred start times for randomly sampled sets of genes (STAR Methods). Further, this significance test is not biased by the fact that CoV-VIPs are enriched for sweeps, as the test remains highly significant (p = 1.10−4) when using random control sets with comparable high-scoring nSL statistics (STAR Methods). Thus, the tight temporal clustering of selection events is a specific feature of the CoV-VIPs, rather than a confounding aspect of any gene set similarly enriched for sweeps. Figure 2 Timing of selection at CoV-VIPs Consequently, our results are consistent with the emergence of a viral epidemic ∼900 generations, or ∼25,000 years (28 years per generation),36 ago that drove a burst of strong positive selection in East Asia. Selection events starting 900 generations ago clearly predate the estimated split of different East Asian populations included in the 1000 Genomes Project from their shared ancestral population.18 Figure 3 Selected CoV-VIPs allele frequency trajectories over time estimated by CLUES in East Asia Although selective pressures other than a coronavirus or another unknown type of virus with similar host interactions might also contribute to these patterns, we note that the signal is restricted specifically at CoV-VIPs and none of 17 other viruses that we tested exhibit the same temporal clustering (peak significance test p > 0.05 in all cases; STAR Methods). Further, this test remained highly significant when retesting the clustering of CoV-VIPs using only RNA VIPs as the control set (p = 4.10−4; Data S1B). Importantly, the estimate of an ancient viral epidemic starting ∼25,000 years ago in East Asia is remarkably congruent with the 23,000 years estimate for the emergence of sarbecoviruses (the viral family of SARS-CoV-2).37
|
|
|
Post by Admin on Jul 6, 2021 22:40:56 GMT
Strong selection drove coordinated changes in multiple CoV-VIP genes over 20,000 years To learn more about the start and duration of selection acting in East Asia, we use CLUES21 to infer allele frequency trajectories and selection coefficients for the inferred beneficial mutations proximal to the 42 CoV-VIP genes with selection starting 900 generations ago according to Relate (Figure 3). We anticipate that selection was probably strongest when the naive host population was first infected, before gradually waning as the host population adapted to the viral pressure.38 Similarly, a decrease in the virulence of the virus over time, a phenomenon that has been reported during long-term bouts of host-virus coevolution,39 would also result in the gradual decrement of selection over time. Hence, for each of the 42 CoV-VIPs predicted to have come under selection ∼900 generations ago, we use CLUES to estimate the selection coefficient in two successive time intervals (between 1,000 and 500 generations ago and from 500 generations ago to the present), predicting that selection would be stronger in the oldest interval. We note that a 500 generations interval was reported as the approximate time span that CLUES provides reliable estimates for humans.21 Following the protocol of Stern et al.,40 we base our estimates on two of the five East Asian populations (i.e., Dai and Beijing Han Chinese; Figures 3A and 3B and 3C and 3D, respectively). CLUES infers more complex frequency trajectories than an abrupt jump in frequency 900 generations ago. Instead, the estimated trajectories (Figures 3A–3D) suggest that 900 generations ago is the approximate time when the bulk of the selected variants reached a frequency of a few percent or more and when there is an acceleration in the frequency increase (Figures 3B and 3D). Note that this does not contradict the strong peak of selection times starting around 900 generations ago found by Relate, as this is the time when Relate estimates frequencies clearly distinguishable from zero (STAR Methods). This might correspond to the transition between the establishment and exponential phases of the sweeps and might imply that the selective pressure could be older than 900 generations. Although the slow starts of frequency increases make it hard to pinpoint when selection started exactly, the vast majority of the selected alleles appear to have reached 5% or higher frequencies by 600 generations, thus making it highly unlikely that the selection would have started later. Frequency trajectories estimated in the Yoruba African population (Figure 4A) or the British European population (Figure 4B) also show very low initial frequencies. The selected variants in East Asia are found nowadays at very low frequencies, especially in Africa (Data S1I). Figure 4 Selected CoV-VIPs allele frequency trajectories over time estimated by CLUES in Africa (Yoruba) and Europe (British) The selected mutations are estimated to have continually increased in frequency in East Asia until ∼200 generations (∼5,000 years) ago (Figures 3A and 3C). Accordingly, CLUES estimates high selection coefficients between 1,000 and 500 generations ago (Dai average s = 0.034; Beijing Han average s = 0.042; Figures 5A and 5B ) but much weaker selection coefficients from 500 generations ago to the present (Dai average s = 0.002; Beijing Han average s = 0.003; Figures 5A and 5B). These patterns are consistent with the appearance of a strong selective pressure that triggered a coordinated adaptive response across multiple independent loci, which waned through time as the host population adapted to the viral pressure and/or as the virus became less virulent. Figure 5 Coronavirus selected VIPs selection coefficients estimated by CLUES This figure shows classic R boxplots of selected coefficients at the 42 Relate selected mutations within the peak around 900 generations ago (STAR Methods). (A) Selection coefficients in the Chinese Dai CDX 1000 Genomes population. (B) Selection coefficients in the Han Chinese from Beijing CHB 1000 Genomes population. Left: average selection coefficients between 0 and 500 generations ago are shown. Right: average selection coefficients between 500 and 1,000 generations ago are shown.
|
|