Post by Admin on Mar 16, 2024 23:54:49 GMT
Founder events increase homozygosity in India
Previous studies have shown that many Indian groups have a history of strong founder events, due to endogamous and consanguineous marriages7,26,27. Founder events reduce genetic variation and increase sharing of genomic regions that are inherited identical-by-descent (IBD) from a few common ancestors28. Descendants of consanguineous marriages (between close relatives) may inherit IBD segments from both parents, resulting in segments that are homozygous-by-descent (HBD). A founder event results in many, small HBD segments, while recent consanguinity results in fewer but longer HBD segments.
We identified IBD and HBD segments in LASI-DAD and 1000G datasets using a haplotype-based IBD detection method, hap-IBD29. To differentiate between the relative effects of founder events and recent consanguineous marriages, we stratified the HBD segments by length– long (> 8cM) indicative of consanguinity and short (< 8cM) mostly reflecting founder events. Indians, on average, have a larger fraction of their genome in HBD segments (∼29 cM) compared to 1000G EAS (∼6 cM), EUR (∼6 cM), and AFR (∼4 cM) (Fig 2A). Within India, individuals from South have significantly higher homozygosity, both in terms of the total amount of their genome in HBD segments (on average, ∼56 cM in South compared to ∼19 cM in other regions, p-value < 10-16) and the fraction of long HBD segments (8.4% vs. 4.3%, p-value < 10-6), reflecting the higher prevalence of consanguineous marriages in the South of India30 (Fig 2A, Fig S5.1-2). A majority (>90%) of the homozygosity stems from small HBD segments (rather than long HBD segments), suggesting a primary role of historical founder events rather than recent consanguinity as the source of homozygosity (Fig 2A, Fig S5.2). Similar results are obtained when we use a threshold of 20 cM to define long HBD segments (Fig S5.1, Fig S5.2B).
Figure 2
Founder events and consanguinity leads to high rates of homozygosity and relatedness in Indians.
(A) plied hap-IBD to infer genome-wide homozygosity in LASI-DAD samples grouped per region and compared ther world-wide groups: East Asian, European, and South Asian populations from 1000G. Black lines show the amount of homozygous segments longer than 8cM per individual, and colored lines the total amount of zygous segments shorter than 8cM. (B) For each of the 2,620 Indian samples and AFR, EAS, EUR and SAS duals in 1000G, we detected the individual sharing the largest total amount (in cM) of genome IBD, referred to osest individual’. For each value x of total shared genome (in cM) on the X-axis, we report the percentage of es (Y-axis) that share x or more with their closest related individual. For LASI-DAD individuals, we also detect osest individuals while bootstrapping to 500 individuals (dashed lines representing mean and 95% CI). The ntal dashed lines indicate the expected value of the total IBD sharing for kth degree cousins. This figure was d from 32.
Next, we investigated genome-wide IBD-sharing across individuals. We computed the fraction of individuals who find at least one close genetic relative within LASI-DAD and compared this proportion across worldwide populations in 1000G (see Methods, Fig S5.3). We infer that ∼51.0% (38.4–59.2% across regions) of individuals in LASI-DAD find at least one genetic relative with expected IBD sharing equivalent to a 3rd degree cousin or closer relationship (∼53 cM) in LASI-DAD, which is markedly higher than 14.2% in SAS, 8.8% in EAS, 8.8% in EUR and 17.2% in AFR from 1000G (Fig 2B, Table S5.1) (note, a previous study identified ∼5–10% of individuals are first and second-degree relatives in Gambians from Mandinka (GWD) and Esan in Nigeria (ESN) contributing to higher relatedness in AFR31). The higher IBD sharing in LASI-DAD, especially compared to 1000G SAS may stem from: (a) larger sample size of LASI-DAD, or (b) ascertainment bias in selecting individuals in either study. We examined each of these hypotheses in turn. We performed bootstrap resampling of equal numbers of individuals (n=500) from LASI-DAD as 1000G SAS and inferred that the fraction of 3rd degree cousins decreased to 24.2% (95% CI: 19.4%–28.6%), yet significantly higher than 1000G SAS (Fig 2B, Table S5.1). In LASI-DAD, individuals were recruited using a stratified random sampling approach. First, Sampling Secondary Units (SSUs) (villages/urban census blocks) were chosen in each state and then within each SSU, individuals were selected randomly. To control for the impact of this ascertainment scheme, we considered pairwise cross-SSU comparisons among individuals (Supplementary Note S5). Using this approach and accounting for the sample size, we continue to find a significant shift in LASI-DAD compared to 1000G SAS, with ∼16.4–35.0% of individuals sharing IBD equivalent to 3rd degree cousins (Fig S5.4). This comparison highlights the limitations of the sampling of 1000G groups for representing genetic variation of India (with mainly a few groups from the subcontinent). Overall, we find that all individuals in LASI-DAD have at least one putative 4th degree cousin or closer relative (with IBD > 10 cM) in the dataset. The high level of relatedness in India is notable, as a similar level of IBD sharing is seen in Europeans with approximately 480,000 individuals (almost 200-fold higher sample size) in UK Biobank32.
The history of founder events predicts a high burden of deleterious variants and increased risk of recessive diseases, as seen in Finns and Ashkenazi Jews28,33. To assess the potential functional effects of founder events in India, we identified 385,985 missense and 20,319 putative loss of function (pLoF) variants (see Methods) (Table S5.2). Each individual carries ∼10,344 (range: 9,911–10,761) derived missense variants, and ∼67 (46–96) pLoF variants on autosomes. Most (>90%) of these variants are rare (frequency below 1%) or singletons (62%). As expected, we observe strong correlation between the homozygous deleterious mutation burden (measured as sum of homozygous missense and pLof variants carried by an individual) and the total sum of HBD per individual in India (Extended Data Fig 2). Among 18,451 protein-coding autosomal genes in the human genome (RefSeq database34), we find missense and pLoFs variants in 89.5% of the genes, ranging between 1–1,265 variants per gene. The top three genes with the highest number of pLoFs variants are mucin genes: MUC3A, MUC16 and MUC17, with respectively 52, 42 and 41 pLoFs, including homozygous pLoFs in MUC17. As there is partial redundancy in the function of mucin genes, there may be greater tolerance for loss of function variants35.
Among the 406,304 SNVs, we find about half are South Asian-specific and a large fraction (40%) are absent in gnomAD or 1000G (Table S5.2). We find that ∼4% of South-Asian specific non-ultra rare (frequency above 0.1%) missense/pLoF variants are present in the ClinVar database36, including 10 classified as ‘pathogenic’ variants (using ClinVar threshold of two-stars, Table S5.2). Among these, we find a South-Asian specific pathogenic variant in the BHCE gene that is present in 15 individuals (0.28%) in LASI-DAD (and not seen outside India). Patients with butyrylcholinesterase deficiency may experience prolonged neuromuscular blockade and muscle paralysis, in response to use of some muscle relaxants used during anesthesia. Previous studies have identified this variant in the founder community of Vysya from Andhra Pradesh where it has drifted to high frequency due to the history of founder events27,37. In LASI-DAD, 8 of the 15 individuals are from Telangana, the neighboring state of Andhra Pradesh. Local community doctors use the Vysya ancestry as a counter-indicator before administering anesthetic drugs, highlighting the potential of reducing disease burden by understanding and documenting the effects of founder events in India.
Previous studies have shown that many Indian groups have a history of strong founder events, due to endogamous and consanguineous marriages7,26,27. Founder events reduce genetic variation and increase sharing of genomic regions that are inherited identical-by-descent (IBD) from a few common ancestors28. Descendants of consanguineous marriages (between close relatives) may inherit IBD segments from both parents, resulting in segments that are homozygous-by-descent (HBD). A founder event results in many, small HBD segments, while recent consanguinity results in fewer but longer HBD segments.
We identified IBD and HBD segments in LASI-DAD and 1000G datasets using a haplotype-based IBD detection method, hap-IBD29. To differentiate between the relative effects of founder events and recent consanguineous marriages, we stratified the HBD segments by length– long (> 8cM) indicative of consanguinity and short (< 8cM) mostly reflecting founder events. Indians, on average, have a larger fraction of their genome in HBD segments (∼29 cM) compared to 1000G EAS (∼6 cM), EUR (∼6 cM), and AFR (∼4 cM) (Fig 2A). Within India, individuals from South have significantly higher homozygosity, both in terms of the total amount of their genome in HBD segments (on average, ∼56 cM in South compared to ∼19 cM in other regions, p-value < 10-16) and the fraction of long HBD segments (8.4% vs. 4.3%, p-value < 10-6), reflecting the higher prevalence of consanguineous marriages in the South of India30 (Fig 2A, Fig S5.1-2). A majority (>90%) of the homozygosity stems from small HBD segments (rather than long HBD segments), suggesting a primary role of historical founder events rather than recent consanguinity as the source of homozygosity (Fig 2A, Fig S5.2). Similar results are obtained when we use a threshold of 20 cM to define long HBD segments (Fig S5.1, Fig S5.2B).
Figure 2
Founder events and consanguinity leads to high rates of homozygosity and relatedness in Indians.
(A) plied hap-IBD to infer genome-wide homozygosity in LASI-DAD samples grouped per region and compared ther world-wide groups: East Asian, European, and South Asian populations from 1000G. Black lines show the amount of homozygous segments longer than 8cM per individual, and colored lines the total amount of zygous segments shorter than 8cM. (B) For each of the 2,620 Indian samples and AFR, EAS, EUR and SAS duals in 1000G, we detected the individual sharing the largest total amount (in cM) of genome IBD, referred to osest individual’. For each value x of total shared genome (in cM) on the X-axis, we report the percentage of es (Y-axis) that share x or more with their closest related individual. For LASI-DAD individuals, we also detect osest individuals while bootstrapping to 500 individuals (dashed lines representing mean and 95% CI). The ntal dashed lines indicate the expected value of the total IBD sharing for kth degree cousins. This figure was d from 32.
Next, we investigated genome-wide IBD-sharing across individuals. We computed the fraction of individuals who find at least one close genetic relative within LASI-DAD and compared this proportion across worldwide populations in 1000G (see Methods, Fig S5.3). We infer that ∼51.0% (38.4–59.2% across regions) of individuals in LASI-DAD find at least one genetic relative with expected IBD sharing equivalent to a 3rd degree cousin or closer relationship (∼53 cM) in LASI-DAD, which is markedly higher than 14.2% in SAS, 8.8% in EAS, 8.8% in EUR and 17.2% in AFR from 1000G (Fig 2B, Table S5.1) (note, a previous study identified ∼5–10% of individuals are first and second-degree relatives in Gambians from Mandinka (GWD) and Esan in Nigeria (ESN) contributing to higher relatedness in AFR31). The higher IBD sharing in LASI-DAD, especially compared to 1000G SAS may stem from: (a) larger sample size of LASI-DAD, or (b) ascertainment bias in selecting individuals in either study. We examined each of these hypotheses in turn. We performed bootstrap resampling of equal numbers of individuals (n=500) from LASI-DAD as 1000G SAS and inferred that the fraction of 3rd degree cousins decreased to 24.2% (95% CI: 19.4%–28.6%), yet significantly higher than 1000G SAS (Fig 2B, Table S5.1). In LASI-DAD, individuals were recruited using a stratified random sampling approach. First, Sampling Secondary Units (SSUs) (villages/urban census blocks) were chosen in each state and then within each SSU, individuals were selected randomly. To control for the impact of this ascertainment scheme, we considered pairwise cross-SSU comparisons among individuals (Supplementary Note S5). Using this approach and accounting for the sample size, we continue to find a significant shift in LASI-DAD compared to 1000G SAS, with ∼16.4–35.0% of individuals sharing IBD equivalent to 3rd degree cousins (Fig S5.4). This comparison highlights the limitations of the sampling of 1000G groups for representing genetic variation of India (with mainly a few groups from the subcontinent). Overall, we find that all individuals in LASI-DAD have at least one putative 4th degree cousin or closer relative (with IBD > 10 cM) in the dataset. The high level of relatedness in India is notable, as a similar level of IBD sharing is seen in Europeans with approximately 480,000 individuals (almost 200-fold higher sample size) in UK Biobank32.
The history of founder events predicts a high burden of deleterious variants and increased risk of recessive diseases, as seen in Finns and Ashkenazi Jews28,33. To assess the potential functional effects of founder events in India, we identified 385,985 missense and 20,319 putative loss of function (pLoF) variants (see Methods) (Table S5.2). Each individual carries ∼10,344 (range: 9,911–10,761) derived missense variants, and ∼67 (46–96) pLoF variants on autosomes. Most (>90%) of these variants are rare (frequency below 1%) or singletons (62%). As expected, we observe strong correlation between the homozygous deleterious mutation burden (measured as sum of homozygous missense and pLof variants carried by an individual) and the total sum of HBD per individual in India (Extended Data Fig 2). Among 18,451 protein-coding autosomal genes in the human genome (RefSeq database34), we find missense and pLoFs variants in 89.5% of the genes, ranging between 1–1,265 variants per gene. The top three genes with the highest number of pLoFs variants are mucin genes: MUC3A, MUC16 and MUC17, with respectively 52, 42 and 41 pLoFs, including homozygous pLoFs in MUC17. As there is partial redundancy in the function of mucin genes, there may be greater tolerance for loss of function variants35.
Among the 406,304 SNVs, we find about half are South Asian-specific and a large fraction (40%) are absent in gnomAD or 1000G (Table S5.2). We find that ∼4% of South-Asian specific non-ultra rare (frequency above 0.1%) missense/pLoF variants are present in the ClinVar database36, including 10 classified as ‘pathogenic’ variants (using ClinVar threshold of two-stars, Table S5.2). Among these, we find a South-Asian specific pathogenic variant in the BHCE gene that is present in 15 individuals (0.28%) in LASI-DAD (and not seen outside India). Patients with butyrylcholinesterase deficiency may experience prolonged neuromuscular blockade and muscle paralysis, in response to use of some muscle relaxants used during anesthesia. Previous studies have identified this variant in the founder community of Vysya from Andhra Pradesh where it has drifted to high frequency due to the history of founder events27,37. In LASI-DAD, 8 of the 15 individuals are from Telangana, the neighboring state of Andhra Pradesh. Local community doctors use the Vysya ancestry as a counter-indicator before administering anesthetic drugs, highlighting the potential of reducing disease burden by understanding and documenting the effects of founder events in India.