Post by Admin on Feb 19, 2021 0:53:22 GMT
Discussion
The origins of the Greco-speaking communities today settled in the Aspromonte mountain area of Reggio Calabria (Southern Italy) have been largely debated from a linguistic point of view. The first hypotheses defending the continuity of the language from the Magna Graecia or its Byzantine origin have been more recently reconciled into composite scenarios, in which the importance of longstanding contacts and multiple contributions from different periods have been reevaluated. As a matter of fact, the territory corresponding to present-day Calabria has been inhabited since prehistoric times and its centrality in the Mediterranean Sea is attested by the presence of artefacts from the most important Neolithic cultures of Southern Italy. Furthermore, its richness in mineral deposits testifies exchanges with the Aegean and Asia Minor civilizations that must have been intense during Metal Ages47,48. Accordingly, the extent of population and cultural interactions between South Italy and the southern part of the Balkan Peninsula including Greece, Crete and the Aegean islands has been confirmed by the presence of Mediterranean genetic links between these regions tracing back to Neolithic and post-Neolithic times2,3,27.
In this study, we analyzed the genetic variability of Calabrian Greco-speaking groups in the context of the local Southern Italian genetic landscape and with respect to the temporal and spatial structuring of the Euro-Mediterranean genetic variation, with the aim to infer the main demographic processes that shaped the genetic heritage of these populations. To this end, we collected a large set of samples representative of the communities settled in the Aspromonte mountain area, including both those that conserved Greco up to the present and the ones that lost the use of this language earlier in time. Then, we compared them against the genomic patterns observed for other not-isolated populations from Southern Italy as well as to a wider reference panel composed of both modern and ancient samples.
Overall, population structure analyses agree with previous studies and generally confirm the presence of strong genetic links between Southern Italy and the Caucasus/Middle-East3,27. Inferences of ADMIXTURE proportions indeed revealed the ancestry of present-day Southern Italian populations, regardless of their linguistic affiliation, to be composed mainly by Sardinian-like and South-Eastern Mediterranean genetic components, with a negligible contribution from a continental Eastern European ancestry instead higher in Northern Italy (Suppl. Figure S1). Accordingly, f3-tests (Suppl. Table S2) fitted a scenario involving mixtures between Sardinia and Caucasus or between Near Eastern and continental European-related ancestries to account for the genetic composition of present-day Southern Italian groups, also showing ancestral genetic connections with a Caucasus or a Caucasus/Near-Eastern branch in the Treemix phylogeny (Suppl. Figure S2).
In this context, both global PCA and ADMIXTURE analyses revealed the genetic proximity of the Aspromonte communities to the other populations of Southern Italy (Fig. 2, Suppl. Figure S1), showing at the same time traces of differentiation. Overall, the analyses of intra-population diversity, measuring both the number and the total length of homozygous genotypes (Fig. 3, Suppl. Table S3) as well as the extent of genome-wide IBD-sharing (Suppl. Figure S6), indeed confirmed higher levels of genetic isolation commonly experienced by the Aspromonte populations, when compared to the other neighboring Southern Italian groups. Furthermore, at a local level the Aspromonte communities departed from the South Italian genetic background, with those more significantly isolated both geographically and culturally occupying the most peripheral positions in the PCA plot and also exhibiting a private genetic component, which indeed reaches the highest frequencies in the Aspromonte groups still speaking Greco (Suppl. Figure S5). Accordingly, within the Aspromonte (ASPR) specific cluster identified by FineSTRUCTURE (Fig. 2, Suppl. Figure S3) the “chunk-length” matrix of haplotypes shared between pairs of individuals, specifically pinpointed the currently Greco-speaking communities as the ones signaling higher levels of drift (i.e. the lowest proportion of haplotype “copying” with other groups, Suppl. Figure S4), thus reflecting patterns of geographic isolation in the Aspromonte area further amplified by cultural differences in the groups that conserved the Greco language.
On the whole, the observed patterns of variation seem therefore to confirm the presence of ancient genetic links between Southern Italy and the South-Eastern Mediterranean populations of Caucasus and the Near East, with the groups from the Aspromonte mountain area—and particularly those that still preserve the Greco language nowadays—that departed from this shared genetic background as a consequence of isolation phenomena.
Previous surveys on the ancient genetic legacy of Southern Italy pointed to genetic contributions linking Southern Italy and Mediterranean Greek islands with Anatolia and the Caucasus tracing back to migratory events occurred during the Neolithic and the Bronze Age, in which the Mediterranean served as a preferential crossroad3,13,27. In particular, while the expansion of Anatolian Neolithic farmers significantly impacted all the Peninsula, differential Bronze-Age contributions were observed for Southern Italy with respect to Northern Italian populations. Bronze Age influences in the gene pool of Southern Italians have been in fact associated to a non-steppe Caucasian-related ancestry carried along the Mediterranean shores at the same time, but independently from the Pontic-Caspian Steppe migrations that occurred through Continental Europe. Consistently with this viewpoint, genetic analyses performed by comparing our modern populations with the main ancient ancestral sources have displayed the clustering of analysed Southern Italian groups with Neolithic and Bronze Age samples from Anatolian, Aegean Minoan and Mycenaean populations, as opposed to the affinity of Northern Italy with Late-Neolithic and Bronze-Age samples from continental Europe (Suppl. Figure S8). Accordingly, both f3-outgroup, qpGraph and qpAdmixture analyses (Fig. 4, Suppl. Figure S9, Suppl. Figure S10) revealed influences related to a Steppe ancestry in the Northern Italian groups, instead paralleled in Southern Italy by an analogous Caucasian-related contribution from a non-Steppe CHG/Iran_N source. Importantly, the same ancestral sources are equally shared both by the present-day “open” (i.e. not-isolated) Southern Italian populations of Benevento, Castrovillari and Catanzaro, as well as by the geographically and linguistically-isolated communities of the Aspromonte mountain area (Fig. 4, Suppl. Table S8), thus signaling a common genetic background that possibly predates the linguistic hypotheses originally suggested about the times of formation of the Greco language in Southern Italy. Accordingly, we hypothesize that the genetic continuity between Southern Italian populations and the other Mediterranean groups may date back to these Neolithic and post-Neolithic events and may have been subsequently maintained and in some cases reinforced by continuous and overlapping gene flows following similar paths of diffusion and interaction between populations, among which the migrations of Greek-speaking people during the classical era (Magna Graecia) and/or in Byzantine and subsequent times. Therefore, the observed patterns could be linked to a tendency to mobility that has always characterized these populations, resulting in continuous cultural and genetic exchanges over time. That being so, the Calabrian Greek ethno-linguistic minorities of Southern Italy may be interpreted as the remnants of a wider area of Greek influence, that by virtue of their geographic isolation have preserved and evolved a unique variety of Greek which has survived through centuries in the mountains of the Aspromonte area. At this respect, the communities showing higher signatures of genetic isolation (Roghudi, Gallicianò, Condofuri and Roccaforte del Greco; Suppl. Figure S4, Suppl. Figure S5) are also the ones located in the more impervious areas of the Aspromonte, at the same time still conserving a certain number of Greco speakers (Suppl. Table S1)40,41.
Incorporating in future studies the information provided by whole genome sequence data will be an additional value to comprehensively understand the interplaying impact of complex demographic history and evolutionary processes. Recent studies (e.g.49) have made efforts to identify loci or regions of the genome evolving in truly neutral vs. non-neutral manner to perform demographic inferences based on whole-sequencing data, also stressing how a-priori assumptions on the neutrality of great part of the genome may bias some resultant inferences (see also50,51). Therefore, even if the limited temporal depth and relatively micro-geographical setting of the present study should in some way prevent relevant biases, future researches in these directions may integrate and be compared to the present work in order to obtain more accurate demographic inferences.
Besides the importance in population history, ethnogenesis and linguistic variation, demographic processes of isolation might have also affected the genetic composition of present-day groups inhabiting these areas of Southern Italy. In fact, the GO analysis showed peculiar biological function of genes related to neurological pathways with higher level of differentiation in the Calabrian area (Suppl. Table S6). Recent studies on hereditary neurodegenerative disorders such as Alzheimer’s, Frontotemporal Dementia and Parkinson diseases in Southern Italy were carried out and highlighted that certain areas of the Calabrian region are characterized by low genetic heterogeneity and high levels of consanguinity due to the geographic isolation over the centuries52,53,54,55,56,57,58. The observation of recurrent mutations and haplotypes in isolated populations with high rates of consanguinity might be potentially informative for the study of hereditary diseases. Overall, these data more generally remark the importance of population isolates in genetic studies. In fact, due to isolation and drift, coupled with the effects of smaller Ne and higher levels of consanguinity, isolated populations may have modified their genetic architecture through the random amplification or loss of certain genetic variants, thus allowing the study of the role of loci found at higher frequency in these groups. In this sense, future studies including also phenotypic data could be of extreme value to understand the role of trait-associated variants on health status as recently demonstrated by research efforts that have linked population genetics and medical genetics (e.g.59).
Materials and methods
Population samples
In this study, we collected and analyzed a total of 149 Southern Italian individuals belonging to 11 villages from the Aspromonte mountain area of Reggio Calabria (Southern Calabria), 4 villages from the province of Catanzaro (Central Calabria), and to population samples from the provinces of Cosenza (Northern Calabria) and Benevento (Campania) (Fig. 1, Suppl. Table S1).
Saliva samples were collected with the Oragene-DNA Self Collection Kit OG-500 (DNA Genotek, Ottawa, Ontario, Canada) from unrelated volunteers, by focusing on subjects with a local genetic ancestry over at least three generations in their respective communities of origin, which were also surveyed for language affiliation.
Ethics statement
All donors provided a written informed consent to data treatment and project objectives, and all the procedures concerning this population genetics study was approved by the Bioethic Committee of the University of Bologna on 08/04/2013. The study was designed and conducted in agreement with relevant guidelines and regulations according to the ethical principles for research involving human subjects stated by the WMA Declaration of Helsinki.
Genotyping and quality filtering
Genomic DNA was purified from Oragene-DNA collection kits following manufacturer’s recommendations and quantified with the Qubit dsDNA BR Assay Kit (Life Technologies, Carlsbad, CA, USA). DNA samples were then genotyped for the 713,014 SNPs implemented in the HumanOmniExpress BeadChip (Illumina, San Diego, CA, USA), by using the facilities available at the Center for Biomedical Research & Technologies of the Italian Auxologic Institute (Milan, Italy).
Genotyping results were filtered using the PLINK software 1.960 after having excluded SNPs on the sex chromosomes. We removed all individuals with a genotyping success rate lower than 92%, variants with missing call rates exceeding 2%, SNPs with a minor allele frequency (MAF) lower than 1%, and markers showing significant deviations from the Hardy–Weinberg equilibrium. In addition, we estimated the degree of identity-by-descent (IBD) sharing and excluded one individual for each pair of samples with a kinship coefficient (PiHat) higher than 12.5%.
After filtering procedures, we obtained a final “local” dataset composed by 141 individuals typed for 621,755 autosomal SNPs markers. The dataset was thinned for genotype-based analyses by removing SNPs in LD (r2 > 0.1) within a sliding window of 50 SNPs advanced by 10 SNPs at the time (PLINK option --indep-pairwise 50 10 0.1), obtaining a “pruned local” dataset consisting of 64,147 SNPs.
The origins of the Greco-speaking communities today settled in the Aspromonte mountain area of Reggio Calabria (Southern Italy) have been largely debated from a linguistic point of view. The first hypotheses defending the continuity of the language from the Magna Graecia or its Byzantine origin have been more recently reconciled into composite scenarios, in which the importance of longstanding contacts and multiple contributions from different periods have been reevaluated. As a matter of fact, the territory corresponding to present-day Calabria has been inhabited since prehistoric times and its centrality in the Mediterranean Sea is attested by the presence of artefacts from the most important Neolithic cultures of Southern Italy. Furthermore, its richness in mineral deposits testifies exchanges with the Aegean and Asia Minor civilizations that must have been intense during Metal Ages47,48. Accordingly, the extent of population and cultural interactions between South Italy and the southern part of the Balkan Peninsula including Greece, Crete and the Aegean islands has been confirmed by the presence of Mediterranean genetic links between these regions tracing back to Neolithic and post-Neolithic times2,3,27.
In this study, we analyzed the genetic variability of Calabrian Greco-speaking groups in the context of the local Southern Italian genetic landscape and with respect to the temporal and spatial structuring of the Euro-Mediterranean genetic variation, with the aim to infer the main demographic processes that shaped the genetic heritage of these populations. To this end, we collected a large set of samples representative of the communities settled in the Aspromonte mountain area, including both those that conserved Greco up to the present and the ones that lost the use of this language earlier in time. Then, we compared them against the genomic patterns observed for other not-isolated populations from Southern Italy as well as to a wider reference panel composed of both modern and ancient samples.
Overall, population structure analyses agree with previous studies and generally confirm the presence of strong genetic links between Southern Italy and the Caucasus/Middle-East3,27. Inferences of ADMIXTURE proportions indeed revealed the ancestry of present-day Southern Italian populations, regardless of their linguistic affiliation, to be composed mainly by Sardinian-like and South-Eastern Mediterranean genetic components, with a negligible contribution from a continental Eastern European ancestry instead higher in Northern Italy (Suppl. Figure S1). Accordingly, f3-tests (Suppl. Table S2) fitted a scenario involving mixtures between Sardinia and Caucasus or between Near Eastern and continental European-related ancestries to account for the genetic composition of present-day Southern Italian groups, also showing ancestral genetic connections with a Caucasus or a Caucasus/Near-Eastern branch in the Treemix phylogeny (Suppl. Figure S2).
In this context, both global PCA and ADMIXTURE analyses revealed the genetic proximity of the Aspromonte communities to the other populations of Southern Italy (Fig. 2, Suppl. Figure S1), showing at the same time traces of differentiation. Overall, the analyses of intra-population diversity, measuring both the number and the total length of homozygous genotypes (Fig. 3, Suppl. Table S3) as well as the extent of genome-wide IBD-sharing (Suppl. Figure S6), indeed confirmed higher levels of genetic isolation commonly experienced by the Aspromonte populations, when compared to the other neighboring Southern Italian groups. Furthermore, at a local level the Aspromonte communities departed from the South Italian genetic background, with those more significantly isolated both geographically and culturally occupying the most peripheral positions in the PCA plot and also exhibiting a private genetic component, which indeed reaches the highest frequencies in the Aspromonte groups still speaking Greco (Suppl. Figure S5). Accordingly, within the Aspromonte (ASPR) specific cluster identified by FineSTRUCTURE (Fig. 2, Suppl. Figure S3) the “chunk-length” matrix of haplotypes shared between pairs of individuals, specifically pinpointed the currently Greco-speaking communities as the ones signaling higher levels of drift (i.e. the lowest proportion of haplotype “copying” with other groups, Suppl. Figure S4), thus reflecting patterns of geographic isolation in the Aspromonte area further amplified by cultural differences in the groups that conserved the Greco language.
On the whole, the observed patterns of variation seem therefore to confirm the presence of ancient genetic links between Southern Italy and the South-Eastern Mediterranean populations of Caucasus and the Near East, with the groups from the Aspromonte mountain area—and particularly those that still preserve the Greco language nowadays—that departed from this shared genetic background as a consequence of isolation phenomena.
Previous surveys on the ancient genetic legacy of Southern Italy pointed to genetic contributions linking Southern Italy and Mediterranean Greek islands with Anatolia and the Caucasus tracing back to migratory events occurred during the Neolithic and the Bronze Age, in which the Mediterranean served as a preferential crossroad3,13,27. In particular, while the expansion of Anatolian Neolithic farmers significantly impacted all the Peninsula, differential Bronze-Age contributions were observed for Southern Italy with respect to Northern Italian populations. Bronze Age influences in the gene pool of Southern Italians have been in fact associated to a non-steppe Caucasian-related ancestry carried along the Mediterranean shores at the same time, but independently from the Pontic-Caspian Steppe migrations that occurred through Continental Europe. Consistently with this viewpoint, genetic analyses performed by comparing our modern populations with the main ancient ancestral sources have displayed the clustering of analysed Southern Italian groups with Neolithic and Bronze Age samples from Anatolian, Aegean Minoan and Mycenaean populations, as opposed to the affinity of Northern Italy with Late-Neolithic and Bronze-Age samples from continental Europe (Suppl. Figure S8). Accordingly, both f3-outgroup, qpGraph and qpAdmixture analyses (Fig. 4, Suppl. Figure S9, Suppl. Figure S10) revealed influences related to a Steppe ancestry in the Northern Italian groups, instead paralleled in Southern Italy by an analogous Caucasian-related contribution from a non-Steppe CHG/Iran_N source. Importantly, the same ancestral sources are equally shared both by the present-day “open” (i.e. not-isolated) Southern Italian populations of Benevento, Castrovillari and Catanzaro, as well as by the geographically and linguistically-isolated communities of the Aspromonte mountain area (Fig. 4, Suppl. Table S8), thus signaling a common genetic background that possibly predates the linguistic hypotheses originally suggested about the times of formation of the Greco language in Southern Italy. Accordingly, we hypothesize that the genetic continuity between Southern Italian populations and the other Mediterranean groups may date back to these Neolithic and post-Neolithic events and may have been subsequently maintained and in some cases reinforced by continuous and overlapping gene flows following similar paths of diffusion and interaction between populations, among which the migrations of Greek-speaking people during the classical era (Magna Graecia) and/or in Byzantine and subsequent times. Therefore, the observed patterns could be linked to a tendency to mobility that has always characterized these populations, resulting in continuous cultural and genetic exchanges over time. That being so, the Calabrian Greek ethno-linguistic minorities of Southern Italy may be interpreted as the remnants of a wider area of Greek influence, that by virtue of their geographic isolation have preserved and evolved a unique variety of Greek which has survived through centuries in the mountains of the Aspromonte area. At this respect, the communities showing higher signatures of genetic isolation (Roghudi, Gallicianò, Condofuri and Roccaforte del Greco; Suppl. Figure S4, Suppl. Figure S5) are also the ones located in the more impervious areas of the Aspromonte, at the same time still conserving a certain number of Greco speakers (Suppl. Table S1)40,41.
Incorporating in future studies the information provided by whole genome sequence data will be an additional value to comprehensively understand the interplaying impact of complex demographic history and evolutionary processes. Recent studies (e.g.49) have made efforts to identify loci or regions of the genome evolving in truly neutral vs. non-neutral manner to perform demographic inferences based on whole-sequencing data, also stressing how a-priori assumptions on the neutrality of great part of the genome may bias some resultant inferences (see also50,51). Therefore, even if the limited temporal depth and relatively micro-geographical setting of the present study should in some way prevent relevant biases, future researches in these directions may integrate and be compared to the present work in order to obtain more accurate demographic inferences.
Besides the importance in population history, ethnogenesis and linguistic variation, demographic processes of isolation might have also affected the genetic composition of present-day groups inhabiting these areas of Southern Italy. In fact, the GO analysis showed peculiar biological function of genes related to neurological pathways with higher level of differentiation in the Calabrian area (Suppl. Table S6). Recent studies on hereditary neurodegenerative disorders such as Alzheimer’s, Frontotemporal Dementia and Parkinson diseases in Southern Italy were carried out and highlighted that certain areas of the Calabrian region are characterized by low genetic heterogeneity and high levels of consanguinity due to the geographic isolation over the centuries52,53,54,55,56,57,58. The observation of recurrent mutations and haplotypes in isolated populations with high rates of consanguinity might be potentially informative for the study of hereditary diseases. Overall, these data more generally remark the importance of population isolates in genetic studies. In fact, due to isolation and drift, coupled with the effects of smaller Ne and higher levels of consanguinity, isolated populations may have modified their genetic architecture through the random amplification or loss of certain genetic variants, thus allowing the study of the role of loci found at higher frequency in these groups. In this sense, future studies including also phenotypic data could be of extreme value to understand the role of trait-associated variants on health status as recently demonstrated by research efforts that have linked population genetics and medical genetics (e.g.59).
Materials and methods
Population samples
In this study, we collected and analyzed a total of 149 Southern Italian individuals belonging to 11 villages from the Aspromonte mountain area of Reggio Calabria (Southern Calabria), 4 villages from the province of Catanzaro (Central Calabria), and to population samples from the provinces of Cosenza (Northern Calabria) and Benevento (Campania) (Fig. 1, Suppl. Table S1).
Saliva samples were collected with the Oragene-DNA Self Collection Kit OG-500 (DNA Genotek, Ottawa, Ontario, Canada) from unrelated volunteers, by focusing on subjects with a local genetic ancestry over at least three generations in their respective communities of origin, which were also surveyed for language affiliation.
Ethics statement
All donors provided a written informed consent to data treatment and project objectives, and all the procedures concerning this population genetics study was approved by the Bioethic Committee of the University of Bologna on 08/04/2013. The study was designed and conducted in agreement with relevant guidelines and regulations according to the ethical principles for research involving human subjects stated by the WMA Declaration of Helsinki.
Genotyping and quality filtering
Genomic DNA was purified from Oragene-DNA collection kits following manufacturer’s recommendations and quantified with the Qubit dsDNA BR Assay Kit (Life Technologies, Carlsbad, CA, USA). DNA samples were then genotyped for the 713,014 SNPs implemented in the HumanOmniExpress BeadChip (Illumina, San Diego, CA, USA), by using the facilities available at the Center for Biomedical Research & Technologies of the Italian Auxologic Institute (Milan, Italy).
Genotyping results were filtered using the PLINK software 1.960 after having excluded SNPs on the sex chromosomes. We removed all individuals with a genotyping success rate lower than 92%, variants with missing call rates exceeding 2%, SNPs with a minor allele frequency (MAF) lower than 1%, and markers showing significant deviations from the Hardy–Weinberg equilibrium. In addition, we estimated the degree of identity-by-descent (IBD) sharing and excluded one individual for each pair of samples with a kinship coefficient (PiHat) higher than 12.5%.
After filtering procedures, we obtained a final “local” dataset composed by 141 individuals typed for 621,755 autosomal SNPs markers. The dataset was thinned for genotype-based analyses by removing SNPs in LD (r2 > 0.1) within a sliding window of 50 SNPs advanced by 10 SNPs at the time (PLINK option --indep-pairwise 50 10 0.1), obtaining a “pruned local” dataset consisting of 64,147 SNPs.