|
Post by Admin on Jul 3, 2023 23:34:14 GMT
During the last two decades the variation of uniparentally inherited markers such as mitochondrial DNA (mtDNA) and the non-recombining part of Y chromosome (NRY) have been exploited in population genetic studies in order to disentangle the problems of the diversity and dispersal of humans both in global and local contexts [26]–[28]. Recently, Western Balkan populations have been studied intensively from the uniparental perspective [17], [29]–[34]. Genetic analysis based upon the variation of Y chromosome haplogroups (hgs) has revealed that the populations of Western Balkan countries share a large fraction of the ancient gene pool of Southeastern Europe, where 70% of the paternal lineages consist of five European-specific hgs: E3b1, I-P37(xM26), J2, R1a, and R1b [31]. Marjanovic et al. [32] suggested that the frequency of NRY hg I-P37 observed in Bosnia and Herzegovina is particularly high and could be partially attributed to genetic drift. High frequencies of hg I-P37 are observed both in Bosniacs (Bosnian Muslims) (43.5%) and Bosnian Serbs (30.9%). This shows that different ethnic groups in Bosnia and Herzegovina share a large subset of their paternal lineages, affected by a major demographic event, the post-LGM expansion. A population with a high frequency of I-P37 from one of the refuges, located possibly in the Balkans, played a great role in the peopling of Bosnia and Herzegovina and surrounding areas. Similar results were observed for Croatian populations [35].
The study of the variation of mtDNA in the population of Bosnia and Herzegovina has shown - like in case of the variation of NRY - that the majority of detected mtDNA hgs among Bosnians belong to the common West Eurasian gene pool [29]. Also, it revealed that the minor part (2%) of Bosnian mtDNA lineages originate from East Eurasia and Africa. The same study observed that the differences between the Slovenian and Bosnian mtDNA pool, were likely due to two different migration waves to the Balkan Peninsula by different groups of Slavs in Middle Age [36], [37]. However, the sampled Bosnian individuals analyzed in that study were of Serbian and Croatian origin. Cvjetan et al. [30] reported that the frequencies of mtDNA hgs in populations from some countries of the former Yugoslavian Federation - Croatia (coast and mainland), Bosnia and Herzegovina, Serbia and Macedonia, including Macedonian Romani - were in concordance with Western Eurasian data. Only for the populations of small Adriatic island isolates, unusual frequencies of some mtDNA lineages have been reported which are otherwise rare in Europe [38]–[40]. Study of Bosch et al. [33], which included Macedonians of the former Yugoslav Republic of Macedonia, Greeks, Romanians and Albanians, as well as five Aromun populations from different parts of the Balkans, suggested that the diversity of both mtDNA and NRY hgs was similar across the Balkans, except for some Aromun populations. According to these studies, the populations of the Balkan Peninsula have been shown to be genetically homogenous and their uniparentally inherited variation is in concordance with the European genetic continuum. However, it was noted that for the better understanding of the genetic history, different intensity of mobility and migration directions of various populations of southeastern Europe, the variation of maternal lineages in the population cluster consisting of Macedonians of the former Yugoslav Republic of Macedonia, Serbians, Croatians, Herzegovinians and Bosnians should be further resolved by higher mtDNA resolution and deeper statistical analysis of sub-groups [30].
The aim of this study was to characterize, in a larger geographical context, the autosomal gene pool of eight Western Balkan populations from six countries - Bosnia and Herzegovina, Croatia, Serbia, former Yugoslav Republic of Macedonia, Montenegro and Kosovo. All studied samples were characterized also for mtDNA and NRY diversity. One of the main questions we address here is whether the whole genome approach with the accent on the variation of autosomal SNPs is in concordance with the information about genetic affinities of the populations of Western Balkan region, revealed by the studies of uniparental markers.
|
|
|
Post by Admin on Jul 6, 2023 16:44:30 GMT
Material and Methods Samples Genome-wide autosomal markers of 70 Western Balkan individuals from Bosnia and Herzegovina, Serbia, Montenegro, Kosovo and former Yugoslav Republic of Macedonia (see map in Figure 1) together with the published autosomal data of 20 Croatians were analyzed in the context of 695 samples of global range (see details from Table S1). The sample of Bosnia and Herzegovina (Bosnians) consisted of subsamples of three main ethnic groups: Bosnian Muslims referred to as Bosniacs, Bosnian Croats and Bosnian Serbs. To distinguish between the Serbian and Croatian individuals of the ethnic groups of Bosnia and Herzegovina from those originating from Serbia and Croatia, we have referred to individuals sampled from Bosnia and Herzegovina as Serbs and Croats and those sampled from Serbia and Croatia as Serbians and Croatians. The cultural background of the studied population is presented in Table S2. DNA samples were collected from unrelated and healthy adult individuals of both sexes. The written informed consent of the volunteers was obtained and their ethnicity as well as ancestry over the last three generations was established. Ethical Committee of the Institute for Genetic Engineering and Biotechnology, University in Sarajevo, Bosnia and Herzegovina, has approved this population genetic research. DNA was extracted following the optimized procedures of Miller et al. [41]. All individuals were genotyped and analyzed also for mtDNA and all male samples for NRY variation. All the details of the larger total sample from where the sub-sample for autosomal analysis was extracted, together with the methods used for the analysis of uniparental markers, are characterized in Text S1. Analysis of autosomal variation In order to apply the whole genome approach 70 samples from the Western Balkan populations were genotyped by the use of the 660 000 SNP array (Human 660W-Quad v1.0 DNA Analysis BeadChip Kit, Illumina, Inc.). The genome-wide SNP data generated for this study can be accessed through the data repository of the National Center for Biotechnology Information – Gene Expression Omnibus (NCBI-GEO): dataset nr. GSE59032, www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59032Genetic clustering analysis To investigate the genetic structure of the studied populations, we used a structure-like model-based maximum likelihood algorithm ADMIXTURE [42]. PLINK software v. 1.05 [43] was used to filter the combined data set, in order to include only SNPs of 22 autosomes with minor allele frequency >1% and genotyping success >97%. SNPs in strong linkage disequilibrium (LD, pair-wise genotypic correlation r2>0.4) were excluded from the analysis in the window of 200 SNPs (sliding the window by 25 SNPs at a time). The final dataset consisted of 220 727 SNPs and 785 individuals from African, Middle Eastern, Caucasus, European, Central, South and East Asian populations (for details, see Table S1). To monitor convergence between individual runs, we ran ADMIXTURE 100 times at K = 3 to K = 15, the results are presented in Figures 2 and S1. Figure 2 ADMIXTURE analysis of autosomal SNPs of the Western Balkan region in a global context on the resolution level of 7 assumed ancestral populations (See Table S1 for population data).
|
|
|
Post by Admin on Jul 7, 2023 17:58:38 GMT
Principal Component Analysis and FST Dataset for principal component analysis (PCA) was reduced with the exclusion of East and South Asians and Africans, in order to increase the resolution level of the populations from the region of interest (see the details in Table S1, Figure 3). PCA was carried out with the software package SMARTPCA [44], the final dataset after outlier removal consisted of 540 individuals and 200 410 SNPs. All combinations between first five principal components were plotted (Figures S2-S11). Figure 3 Principal component (PC) analysis of the variation of autosomal SNPs in Western Balkan populations in Eurasian context (PC1 versus PC2; see Table S1 for population data). Pairwise genetic differentiation indices (FST values) for the same dataset used for PCA were estimated between populations, and regional groups for all autosomal SNPs, using the approach of Weir and Cockerham [45] as in [46]: the total number of populations was 32 and the total number of samples after quality control was 541 (Table S1; Figure 4A,B). A distance matrix of FST values for the populations specified in Table S1 was used to perform a phylogenetic network analysis (Figure 5) using the Neighbor-net approach [47] and visualized with the EqualAngle method implemented in SplitsTree v4.13.1[48]. Figure 4 A: FST-distances based on the variation of autosomal SNPs. A: FST-distances of Western Balkans populations in a global context; B: Region-wise FST-distances of the studied populations. FST-values are from 0,03 (dark blue) to 0,00005 (dark brown).
|
|
|
Post by Admin on Jul 10, 2023 6:28:10 GMT
Figure 5 Network of 29 populations constructed with the Neighbor-net approach from FST distances based on the variation of autosomal SNPs. Western Balkan populations are indicated with violet color. TreeMix To analyze the population splits and migration events the software TreeMix [49] was used. The dataset (Table S1) consisted of Western and Eastern Balkan populations in the background of a set of South, West and East European populations, the Ethiopians were used as an outgroup. The same filters described above were used, ending up with the dataset of 351 individuals and 202 936 SNPs. We used –k 200 setting to further account for the LD following the TreeMix manual. 100 TreeMix runs for each model of 0 to 10 migration events were performed, the graphs and residual plots were constructed according to the manual using R [50]. At least six best runs arriving at similar log-likelihood (LL) scores for each migration model were examined and all these ended up with very similar LLs and tree topologies. We have chosen to discuss the results with the example of a TreeMix model with the best LL (1371,95), assuming 10 migrations presented in Figure 6. We have also run three population test to calculate a f3-statistic [51], [52] for the same sample set of 21 populations used in the TreeMix analysis for all possible triplets. For this we used the software Threepop within TreeMix package [49]. The total number of SNPs was 202 936 and the f3 of the LD-pruned dataset has been estimated in 1014 blocks. Significant (Z-score is ≤−2) negative values of f3(C; A,B) reflect a signal that population C has arisen from an admixture between groups related to populations A and B. The results are presented in Table S3. Figure 6 TreeMix analysis of Western Balkan and surrounding populations (see Table S1 for population data). TreeMix graph represents the model of 10 gene-flow events within the sample. A. The population tree with gene-flow (migration) events. The scalebar specifies the weight of a migration, precise value of it is shown on the migration edges; B. Residuals plot; C. Ultrametric tree. Analysis of segments identical by descent The analysis was designed to compare patterns of shared tracts that are identical by descent (ibd) between different ethno religious groups of Western Balkan region with Middle Eastern populations. The Ottoman rule over the Balkans during 15–19 cc AD led inter alia to the conversion of the local people to Islam, the largest number of whose assumed descendants live in contemporary Bosnia and Kosovo [53]. We questioned whether this cultural transformation was associated with a gene flow between Middle Eastern and Balkan populations. To do so we considered separately the Muslim (Bosniacs, Kosovars) and non-Muslim (Bosnian Croats and Serbs, Croatians, Serbians, Slovenians, Macedonians and Montenegrins) populations of Western Balkan region and calculated pairwise ibd sharing for each of these populations and Middle Eastern populations (Turks, Saudis, Palestinians, Iranians, Syrians). The details of the dataset has been characterized in Table S1. We used the fastIBD (fIBD) algorithm implemented in BEAGLE software package (http://faculty.washington.edu/browning/beagle/beagle.html) [54] to detect chromosomal segments ibd between pairs of individuals. The fIBD algorithm was applied to the 22 autosomes in 10 iterations and the IBD threshold was set to 1e–10. Since the power of the fIBD algorithm to detect segments shorter than 1 centiMorgan (cM) is low, we considered only ibd segments longer than 1cM. We summarized ibd sharing for six classes of ibd segments (1–2 cM, 2–3 cM, 3–4 cM, and 4–5 cM). We estimated an average number of ibd segments per pair of individuals for Muslim and non-Muslim populations of Western Balkan vs Middle Eastern populations (Figure 7, Table S4). Furthermore, we calculated the average total length of genome shared identical by descent (in cM for four length classes: 1–2, 2–3, 3–4, 4–5 and 5–6) for Muslim Western Balkan populations vs each Middle Eastern population for each length class. To test whether observed level of ibd sharing between Muslim Western Balkan populations and Middle Eastern populations can be expected by chance, we performed a permutation test. For this, we considered pooled non-Muslim Western Balkan populations as a background and applied the statistical approach described in Yunusbayev et al. [55]. We compared ibd sharing from permuted samples to that of Muslim Western Balkan populations and recorded the number of tests showing equal or higher values. The total number of comparable values was divided by total number of permutations to obtain p-value (Figure S12).
|
|
|
Post by Admin on Jul 11, 2023 18:59:27 GMT
Figure 7 Average number of ibd segments per pair shared between Muslim Western Balkan populations (A – Bosniacs; B - Kosovars) and Middle Eastern (Saudis, Iranians, Syrians, Turks, Palestinians) and other non-Muslim Western Balkan populations (Bosnian Croats and Serbs, Croatians, Macedonians, Serbians, Montenegrins). Mantel test The Mantel test (Table 1) with 10 000 permutations for analyzing the correlation between the variation of linguistic, geographical and genetic parameters was conducted by the use of Arlequin software v3.5 [56]. Table 1 Correlation analysis between genetic, geographical and linguistic variation. Results of Mantel test. Autosomes MtDNA Y chromosome Correlation coefficient P-value %2 Correlation coefficient P-value %2 Correlation coefficient P-value %2 Genetics and geography 0,05 0,32 0,00 −0,21 0,69 0,05 0,35 0,08 0,12 Genetics and geography (linguistics held constant) −0,44 0,92 0,00 −0,34 0,88 0,04 0,31 0,14 0,11 Genetics and linguistics1 0,87 0,05 0,75 0,85 0,10 0,72 0,20 0,24 0,04 Genetics and linguistics (geography held constant) 0,89 0,07 0,80 0,86 0,10 0,71 0,10 0,33 0,02 Unexplained genetic variance 0,20 0,25 0,87
1linguistic affiliations are as in [65]. 2proportion of genetic variation described by given parameter.
|
|