Genome diversity in Ukraine

new

« Prev
1
Next »

Admin
Administrator

Posts: 73,597

Genome diversity in Ukraine Feb 28, 2024 21:36:28 GMT

Quote

Post by Admin on Feb 28, 2024 21:36:28 GMT

Abstract
Background
The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage.

Results
The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population.

Conclusions
Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.

Data Description
Context
Ukraine is the largest country located fully in Europe, with a population that was formed as a result of several millennia of migration and admixture. It occupies the intersection between the westernmost reach of the great steppe and the easternmost extent of the great forests that spread across Europe, at the crossroads of the great trade routes from “Variangians to the Greeks” along the river Dnipro, which the ancient Greeks referred to as Borysthenes, and the Silk Road linking civilizations of Europe and Asia [1]. This land has seen the great human migrations of the Middle Ages sweeping from across the great plains, and even before that in the more distant past, of the early farmers [2] and the nomads who first domesticated the horse [3–6]. Here, at the dawn of the modern human expansion, our ancestors met the Neanderthals who used to hunt the great game along the glacier of the Ice Age [7, 8].

The rich history shaped genetic diversity in the population living in the country of Ukraine today. As people have moved and settled across this land, they have contributed unique genetic variation that varies across the country. While the ethnic Ukrainians constitute approximately more than three-quarters of the total population, this majority is not uniform. A large Russian minority compose approximately one-fifth of the total population, with higher concentration in the southeast of the country. Smaller minority groups are historically present in different parts of the country: Belarusians, Bulgarians, Crimean Tatars, Greeks, Gagauz, Hungarians, Jews, Moldovans, Poles, Romanians, Roma (Gypsies), and others [9].

This study offers genome data from 97 individuals from Ukraine (Ukrainians from Ukraine [UAU]) to the scientific community to help fill the gaps in the current knowledge about genomic variation in Eastern Europe, a part of the world that has been largely and consistently overlooked in global genomic surveys [10]. To our knowledge, this was the first effort to describe and evaluate the genome-wide diversity in Ukraine. Samples were successfully sequenced using BGI's DNA Nanoball (DNBSEQ™) sequencing technology and cross-validated by Illumina sequencing and genotyping. The major objectives of this study were to demonstrate the importance of studying local variation in the region and to demonstrate the distinct and unique genetic components of this population. Of particular interest were the medically related variants, especially those with allele frequencies that differed with the neighboring populations. As a result, we present and describe an annotated dataset of genome-wide variation in genomes of healthy adults sampled across the country.

Dataset
The new dataset includes 97 whole genomes of self-reported UAU at 30× coverage sequenced using BGISEQ-500 (one of the range of DNBSEQ™ sequencers; BGI Inc., Shenzhen, China) and annotated for genomic variants: single-nucleotide polymorphisms (SNPs), indels, structural variants, and mobile elements. The samples were collected across the entire territory of Ukraine, after obtaining institutional review board (IRB) approval (Protocol 1 from 09/18/2018, Supplementary File S1) for the entire study design and informed consent from each participating volunteer (Supplementary File S2). Each participant in this study had an opportunity to review the informed consent, received an explanation of the nature of the genome data, and made a personal decision about making it public.

The majority of samples in this study (86 of 97) were additionally genotyped using Illumina Global Screening Array (Illumina Inc., San Diego, CA, USA) to confirm the accuracy of base calling between the 2 platforms. In addition, 1 sample (EG600036) was also sequenced on the Illumina NovaSeq 6000 S4 (2 × 150 bp; ∼60× coverage) and used for validation of the variant calls (see summary in Supplementary Table S1 and full sequencing statistics for individual samples in Supplementary Table S1.2). The list of the cross-validated samples and the source technology of the data is presented in Supplementary File S3.

The present dataset contains locations and frequencies of >13 million unique variants in UAU that are further interrogated for functional impact and relevance to medically related phenotypes (Table 1 and data in GigaDB [11]). As much as 3.7% of these alleles, or 478,000, are novel genomic SNPs that have never been previously registered in the Genome Aggregation Database (gnomAD) [12] (Table 1). This number is similar in magnitude to what was reported earlier in 2 populations from European Russia (3–4% [13]). Many of the discovered variants (12.6%) are also currently missing from the global survey of genomic diversity in the 1000 Genomes Project (1KG) [14]. The majority of these described variants are rare or very rare (<5%; Supplementary Fig. S2).

academic.oup.com/gigascience/article/10/1/giaa159/6079618

Admin
Administrator

Posts: 73,597

Genome diversity in Ukraine Mar 2, 2024 20:29:44 GMT

Quote

Post by Admin on Mar 2, 2024 20:29:44 GMT

Data Description
Context
Ukraine is the largest country located fully in Europe, with a population that was formed as a result of several millennia of migration and admixture. It occupies the intersection between the westernmost reach of the great steppe and the easternmost extent of the great forests that spread across Europe, at the crossroads of the great trade routes from “Variangians to the Greeks” along the river Dnipro, which the ancient Greeks referred to as Borysthenes, and the Silk Road linking civilizations of Europe and Asia [1]. This land has seen the great human migrations of the Middle Ages sweeping from across the great plains, and even before that in the more distant past, of the early farmers [2] and the nomads who first domesticated the horse [3–6]. Here, at the dawn of the modern human expansion, our ancestors met the Neanderthals who used to hunt the great game along the glacier of the Ice Age [7, 8].

The rich history shaped genetic diversity in the population living in the country of Ukraine today. As people have moved and settled across this land, they have contributed unique genetic variation that varies across the country. While the ethnic Ukrainians constitute approximately more than three-quarters of the total population, this majority is not uniform. A large Russian minority compose approximately one-fifth of the total population, with higher concentration in the southeast of the country. Smaller minority groups are historically present in different parts of the country: Belarusians, Bulgarians, Crimean Tatars, Greeks, Gagauz, Hungarians, Jews, Moldovans, Poles, Romanians, Roma (Gypsies), and others [9].

This study offers genome data from 97 individuals from Ukraine (Ukrainians from Ukraine [UAU]) to the scientific community to help fill the gaps in the current knowledge about genomic variation in Eastern Europe, a part of the world that has been largely and consistently overlooked in global genomic surveys [10]. To our knowledge, this was the first effort to describe and evaluate the genome-wide diversity in Ukraine. Samples were successfully sequenced using BGI's DNA Nanoball (DNBSEQ™) sequencing technology and cross-validated by Illumina sequencing and genotyping. The major objectives of this study were to demonstrate the importance of studying local variation in the region and to demonstrate the distinct and unique genetic components of this population. Of particular interest were the medically related variants, especially those with allele frequencies that differed with the neighboring populations. As a result, we present and describe an annotated dataset of genome-wide variation in genomes of healthy adults sampled across the country.

Dataset
The new dataset includes 97 whole genomes of self-reported UAU at 30× coverage sequenced using BGISEQ-500 (one of the range of DNBSEQ™ sequencers; BGI Inc., Shenzhen, China) and annotated for genomic variants: single-nucleotide polymorphisms (SNPs), indels, structural variants, and mobile elements. The samples were collected across the entire territory of Ukraine, after obtaining institutional review board (IRB) approval (Protocol 1 from 09/18/2018, Supplementary File S1) for the entire study design and informed consent from each participating volunteer (Supplementary File S2). Each participant in this study had an opportunity to review the informed consent, received an explanation of the nature of the genome data, and made a personal decision about making it public.

The majority of samples in this study (86 of 97) were additionally genotyped using Illumina Global Screening Array (Illumina Inc., San Diego, CA, USA) to confirm the accuracy of base calling between the 2 platforms. In addition, 1 sample (EG600036) was also sequenced on the Illumina NovaSeq 6000 S4 (2 × 150 bp; ∼60× coverage) and used for validation of the variant calls (see summary in Supplementary Table S1 and full sequencing statistics for individual samples in Supplementary Table S1.2). The list of the cross-validated samples and the source technology of the data is presented in Supplementary File S3.

The present dataset contains locations and frequencies of >13 million unique variants in UAU that are further interrogated for functional impact and relevance to medically related phenotypes (Table 1 and data in GigaDB [11]). As much as 3.7% of these alleles, or 478,000, are novel genomic SNPs that have never been previously registered in the Genome Aggregation Database (gnomAD) [12] (Table 1). This number is similar in magnitude to what was reported earlier in 2 populations from European Russia (3–4% [13]). Many of the discovered variants (12.6%) are also currently missing from the global survey of genomic diversity in the 1000 Genomes Project (1KG) [14]. The majority of these described variants are rare or very rare (<5%; Supplementary Fig. S2).


	All samples	Mean per sample
Sequencing results			Novel % gnomAD (1000 Genomes)a	All	Novel %
Total sequence reads	99.8 Bn			1.03 Bn	
Mean coverage	97 Samples at 30×			30×	
Variation
No. of total unique variants	Novel gnomAD count			
SNPs	13,010,979	477,564	3.7 (12.6)	3,488,083	0.1 (0.7)
Bi-allelic	12,667,283	470,667	3.7 (12.7)	3,340,557	0.3 (0.6)
Multi-allelic	343,696	6,897	2.0 (7.4)	146,340	0.8 (4.7)
Small indelsb	2,727,604	76,484	2.8 (7.4)	917,731	0.3 (1.0)
Deletions	1,805,739	55,599	3.1 (9.0)	624,919	0.3 (2.4)
Insertions	14,459,87	30,453	2.1 (6.7)	571,461	0.2 (2.1)
Structural variants  c
Large deletions	16,078	10,914	67.9 (48.3)	3,524	52.6 (19.1)
Large duplications	1,845	1,356	73.5 (42.3)	562	89.4 (35.2)
Inversions	337	314	93.2 (47.8)	185	94.1 (48.6)
Mobile element insertions
Alu	2,316	1,805	77.9 (38.1)	473	68.1 (18.0)
L1	451	289	64 (50.1)	79	60.8 (27.8)
SVA	100	75	75 (52.0)	20	70 (50)
NUMT	714			16	
a
Defined as percent not reported in gnomAD (1000 Genomes).

b
Small indels are insertions and deletions <50 bp called by GATK [16].

c
Large deletions and duplications are those called by lumpy [17], which are >50 bp.

Table 1:Summary of variation in the 97 whole-genome sequences from Ukraine