|
Post by Admin on Apr 7, 2024 20:07:41 GMT
Fig. 3. Genome-wide analysis: Admixture plots based on a worldwide population panel (Dataset S1: low density) for k = 9. (A) Boxplot of the distribution of shared IBD segments between any Malagasy and any individual from the reference population in high-density panel 2 from Eurasia in blue (B) or Africa in red (C). Ancestral contributions were also assessed by computing the chromosome fragments shared between Malagasy and other populations (Fig. 3 and Fig. S3). Using reference populations sampled from across the world, we found that Bantu and Indonesian populations share most of the large fragments identical by descent (IBD >2 cM) with Malagasy individuals. These analyses exclude other (e.g., Indian, Ethiopian, or Somali) populations as putative major contributors (Fig. S3). Nevertheless, there are potential minor contributions; for example, one Malagasy individual shared on average one fragment with the French Basque population. Using a second set of reference populations more centered on the Indian Ocean, we confirmed the close link with Bantu and Indonesian populations. On average Malagasy individuals share 4.32 ± 0.04 fragments with Bantu populations in general and share even more (5.5 ± 0.05 fragments) with the Bantu populations from southeast Africa (Fig. S3). On the Asian side, populations from Indonesia, especially south Borneo, share the highest number of fragments (1.23 ± 0.01), suggesting that, among the populations sampled from outside Africa, they are the closest link with the Malagasy. Based on IBD sharing, demographic simulation of the split between the Malagasy and source populations from south Borneo led to two scenarios with similar likelihoods: (i) a hard split 2,500 y BP and (ii) a slow divergence with less and less migration between 3,000–2,000 y BP (Fig. S4). The split from south African Bantu groups seems to have occurred much more recently, around 1,500 y BP. All analyses thus converge on two main ancestries for the entire Malagasy population, namely Bantu from southeast Africa and Austronesians from Indonesia (in particular, south Borneo), with a very limited contribution from Europe and the Middle East. The IBD-based model suggests that the split between Malagasy and south Borneo is older than the split between Malagasy and southeast African Bantu, indicating that Indonesians populations might have arrived before African populations. Geography of Malagasy Genomic Diversity. We next investigated the geographical distribution of African and Asian ancestry across Madagascar and found significant differences (Fig. 1B). All three genomic components (mtDNA, Y chromosome, and genome-wide) are highly correlated with geography [Moran’s autocorrelation coefficient (I) for all analyses: P < 10−5]. Maternal African lineages are present mostly in the north of the island and are even in the majority in the extreme north of Madagascar, whereas maternal lineages from Asia are in higher frequency in the center and the south of the island. In contrast, Asian paternal lineages are much lower in frequency, reaching only 30% in the center, and African paternal lineages are present mostly on the coast and in the north of Madagascar. The distribution of ancestral components based on genome-wide data indicates that people in the highlands in the center of the island have mostly Asian ancestry (>65%), whereas people from the coastal regions have higher African ancestry (>65%). To test the existence of further genetic structure linked to geography, we performed hierarchical clustering of the genome-wide data via fineSTRUCTURE (Fig. S5) (26). Because the fineSTRUCTURE algorithm does not use geographical information, the correlation between the geographical position and genetic cluster assignment of individuals suggests the existence of an effect of geography on Malagasy diversity (27). Examining the different levels of clustering indicates how human genetic structure varies across the Madagascar landscape (27). At the lowest level (k = 2) (Fig. S5D), the distributions of the two clusters are significantly correlated with geography (Moran’s I all P < 10−5) and distinguish highland from coastal regions, similar to the geographical pattern observed in the admixture analysis. Furthermore, the ratio of African/Asian ancestry in these two clusters differs significantly: The cluster in the center of the island has a mean Asian ancestry of 68%, vs. 38% in the other cluster (Wilcoxon tests; P < 10−16). This result means that the main genetic structure of the present Malagasy population reflects variation in the amount of African vs. Asian ancestry. However, increasing the level of clustering produces new clusters that are also correlated with geography: At k = 3 a new cluster appears in the center; at k = 4 and k = 5 new clusters appear in the north; and from k = 6 to k = 8 new clusters appear in the south (Fig. S5D). For all clusters at all levels of analysis there is a significant correlation with geography (Moran’s I, all P < 10−5). The spatial distribution at each level of genetic clustering thus reveals a strong effect of geography on present genomic diversity. This result also suggests that the genome-wide diversity of Malagasy populations is not structured solely by a simple dichotomy of African vs. Asian ancestry. To study the role of ancestry in the present genetic structure in more detail, we analyzed a division of the fineSTRUCTURE tree into 10 genetic groups (g1–g10) (Fig. 4). Although there is no optimal level of clustering, and all levels of clustering are informative (27), this level of clustering has the advantages of giving a fairly large number of clusters to investigate fine-scale differences among clusters along with enough individuals per cluster (between 50–100 individuals) that differences between clusters are likely to be real and not an artifact of small sample sizes. Fixation index (Fst) distances between these genetics groups are low (Fst = 0.0075) (Fig. S6), similar to Fst values between populations living in Great Britain (27), suggesting the same level of differentiation.
|
|
|
Post by Admin on Apr 9, 2024 21:17:42 GMT
Fig. 4. Genetic groups. (A) Geographic distribution of genetic groups in Madagascar. Each dot represents a village, and the intensity of the color corresponds to the relative presence of individuals of each group. (B) Kriging model of the spatial distribution of genetic groups based on the frequency of each group in each sampled village. (C) Superposition of all genetic groups distributions (based on kriging model). Colors were assigned according to the dominant genetic cluster present in a given area. Plain colors were used for locations where the majority of people (>50%) belong to this cluster. As seen with lower levels of clustering, the amount of African vs. Asian ancestry varies significantly across the genetic groups (F value 551.6, P < 2 10−16; ANOVA) (Fig. S6). For example, the Asian component is dominant in the highland cluster g01 (65 ± 0.5%), whereas it is present at only 22.6 ± 0.7% in the northern cluster g03. However, differences in African vs. Asian ancestry cannot explain all the observed structure, because several genetic groups do not differ significantly in terms of African vs. Asian ancestry. For example, although there is a marked difference in geographic distribution between genetic groups 7 and 8, their percentages of African and Asian ancestry are nearly identical (Fig. S6C).
Because it has been suggested that different source populations were involved in the settlement of different regions of Madagascar (28), we tested whether different populations in Indonesia or Africa might be closer to different Malagasy genetic groups by performing IBD analysis group by group (Fig. S3 C and D). We obtained very similar results for all populations; the highest number of shared IBD is always with south African Bantu, and on the Asian side the highest is always with south Borneo. Moreover, based on PCAdmix it is possible to deconvolute ancestry at each locus in the genome and then analyze the African and Asian ancestry in each Malagasy genetic group separately. Based on this deconvolution, TreeMix analyses run separately with African and Asian populations confirm that all Malagasy genetic groups share the same origin; all diverge from both south Borneo and Bantu populations (Fig. S6). Both analyses thus indicate that differential origin does not explain the observed differences among the genetic groups and thus suggest that the detectable genomic structure of the Malagasy population that is correlated with geography is not solely the result of the admixture and settlement process but might also reflect the later history of Madagascar.
Demographic History. To study the tempo of the settlement of Madagascar, we computed the admixture time and inferred the demographic history of the genetic groups defined by fineSTRUCTURE. We computed the time since admixture using two different methods, namely GLOBEtrotter and ALDER. The GLOBEtrotter analysis indicates that all genetic groups are from an admixture with two sources: south African Bantus and a south Borneo population, i.e., Benjar for g01 and south Dayak for the others (for all analyses, r2 > 0.99, P < 0.01). For all populations but one, the GLOBEtrotter results suggest a single admixture event rather than multiple admixture events. GLOBEtrotter suggests two admixture events only for genetic group g07, with the first admixture occurring 28 generations ago and the second occurring four generations ago. Both GLOBEtrotter and ALDER analysis date the single admixture event to between 500 and 900 y BP. The oldest admixture occurred 800 ± 25 y BP in the eastern populations (g10, g08, and g04 based on GLOBEtrotter) (Fig. 5B), whereas the most recent admixture events (665 ± 19 y BP) involve genetic group g03, which is the most northern genetic group and also has the most African ancestry (Fig. S7 and Dataset S2). The significant differences in admixture dates and in the percentage of African/Asian ancestry between the genetic groups suggest independent admixture events across Madagascar rather than settlement by an already admixed population.
|
|
|
Post by Admin on Apr 12, 2024 3:17:52 GMT
Fig. 5. Demographic model of the settlement of Madagascar. The plots in the three panels share the same x axis. (A) Timeline of the point and smoothed estimation of the average number of common ancestors shared between Malagasy and Borneo (in blue) and Malagasy and Bantu (in red) populations, as estimated by shared IBD segments from genome-wide data. (B) Estimation of percentage and date of admixture for each genetic group of Madagascar. The straight line represents the uncertainty (±1 SD) of admixture dates estimated from GLOBEtrotter. Some populations overlap. (More detailed results are provided in Fig. S7B). (C) Estimation of changes in the effective population size across time for selected genetic groups and for the whole Malagasy population, estimated by shared IBD genome-wide. Demographic inference for the entire Malagasy population based on IBD sharing (Materials and Methods and Fig. 5C) suggests a population expansion beginning between 1,250 and 1,000 y BP. Separate analyses for each genetic group present similar patterns, with expansions between 1,250 and 750 y BP. The earliest expansion is g03, from the north of the island; g01 (in the center) underwent a strong bottleneck, with a reduction in population size to a few hundred people between 1,000 and 800 y BP. We also observed a decrease in the size of g05 (from the south) between 500 and 250 y BP (Fig. S7). DISCUSSION This study presents an extensive overview of the genetic diversity across Madagascar, providing comprehensive insights into the settlement of the island Fig. 6). The present Malagasy population shares recent common ancestors with Bantu and Austronesian populations now living 8,000 km apart (Fig. 6A). The distribution of African and Asian ancestry across the island reveals that the admixture was sex biased and happened heterogeneously across Madagascar, suggesting independent colonization of Madagascar from African and Asians populations (Fig. 6B). After the admixture, further events led to a finer-scale genetic structure (Fig. 6C), despite the recent internal migration reported by historians (Fig. 6D).
|
|
|
Post by Admin on Apr 13, 2024 20:04:01 GMT
Fig. 6. Overview of the inferred history of Madagascar. Descriptions and dates are given in A–D. Our results indicate that across the entire country all Malagasy individuals share recent Austronesian and Bantu ancestry (Fig. 6). We identified a recent split of the proto-Malagasy population from southern African Bantus around 1,500 y BP and an older split from south Borneo between 3,000 and 2,000 y BP. This result suggests that Indonesians populations may have arrived on Madagascar before African populations. However, these dates reflect the age of the oldest possible common ancestors between Malagasy and the African/Indonesian sampled populations, meaning that the departure to Madagascar is not later but could be earlier than these dates. Our large sampling across Madagascar indicates a link to the south Borneo region (confirming a link to Ma'anyan-related populations) and does not support specific genetic connections with Sulawesi or Malays (29, 30). However, it is possible that more closely related populations exist in regions for which we lack data (e.g., Java or Mozambique), and it is also possible that the present-day locations of the putative source populations are not where they were in the past. We also detected a small contribution from Middle East and European populations (Fig. 6A), probably connected to Swahili populations and the Arab world. However, we did not detect substantial genetic ancestry from ancient pre-Bantu/pre-Austronesian populations, as previously proposed based on the mtDNA M23 haplogroup (13), or any genetic contribution from Indian populations. However, minor contributions might have happened, and further work based on genome-wide sequencing and/or additional reference populations might be able to identify such small contributions. It is also possible that there were pre-Austronesian/Bantu people in Madagascar but that they did not contribute any genetic ancestry to the present Malagasy population; hence future work involving ancient DNA will be crucial to test such possibilities. The dates for the split from the source populations, along with the admixture dates and demographic analyses, all indicate that the settlement of Madagascar by the ancestors of the present Malagasy populations was a recent and rapid process, with the admixture happening within the last 1,500 y (Fig. 6). That the African ancestry on Madagascar falls within the present Bantu genetic diversity and the high number of African mtDNA haplogroups also suggest recent settlement. The significant differences in admixture dates for the genetic groups suggest independent admixture events across Madagascar (Fig. 6). This assumption is reinforced by the variation in amounts of African and Asian ancestry across the island and by the heterogeneity in sex bias across Madagascar. These observations strongly support the separate arrival of Asians and Africans across Madagascar, with subsequent genetic mixing occurring independently across the island. North Madagascar appears to be the principal landing zone for African populations, with demographic expansion beginning before 1,000 y BP. Northern populations have the highest amount of paternal, maternal, and autosomal African ancestry, and their identification as the major substrate for the African ancestry across the rest of the island is supported by the following evidence: the TreeMix analysis of African ancestry places the northern population as the root of the tree (Fig. S6D); the earliest demographic expansion occurred in the north (Fig. S7A), which is the location with the highest mtDNA diversity (Fig. S1D); and ChromoPainter analysis showed that all other genetic groups located in the coastal area of Madagascar share the most genetic ancestry with the northern genetic group g03 (Fig. S5A). One potential complication with this scenario is that genetic group g03 (located in the north and with the most African ancestry) has the most recent date for admixture, suggesting a later and limited diffusion of Asian gene flow in this population. Another complication with this interpretation of the data would be the existence of a possible recurrent genetic exchange between Africa and Madagascar, even if it is not detected by ChromoPainter. More genomic data from east Africa would be needed to explore this possibility. Nevertheless, the scenario of an initial colonization in the north is also in agreement with archaeological evidence indicating permanent occupation in northern Madagascar at the site of Mahilaka (31). Other archaeological sites in the north, such as Lakaton’I Anja, Irodo, and Iharana, demonstrate trade connections with East Africa (6, 32, 33, 34). However, in arid southern Madagascar there are trade centers that are contemporary with but much smaller than Mahilaka (such as the archaeological sites of Andranosoa, Mahirane, and Andaro), and these also indicate permanent occupation and trade connections with East Africa (11, 35, 36). Although northern populations present a majority of both maternal and paternal lineages from Africa, across the other parts of Madagascar the maternal lineages are predominantly from Island Southeast Asia, whereas paternal lineages are mainly from Africa (Figs. 3 and and6).6). This difference suggests a strong sex bias involving contributions from Bantu males diffusing from the north to the south. The earliest admixture dates are around 800 y BP on the Madagascar eastern coast, suggesting that the southeast was already settled by Austronesians (men and women) before the arrival of Africans (primarily men). The TreeMix analysis of Asian ancestry is compatible with this scenario because the root of the Asian ancestry is on the northeast coast, with a rapid diversification of other populations to the south. The hypothesis that Austronesians were the first to settle Madagascar before an African paternal wave is supported by the earlier split of Malagasy from Indonesian source populations and explains the predominance of both Austronesian maternal lineages and the Austronesian linguistic background. The known archeological sites of fishermen and rice cultivators in the south, such as Maliovola, Mokala, and Ambinanibe, dated to the ninth century AD onwards (37), might be related to the Austronesian colonists; future paleogenetic studies might be informative. Our analyses also reveal a singular history for the Central Highlands: Contemporaneous with the admixture progressing across Madagascar, there was a drastic decrease in the effective size (down to a few hundred persons) of the population now located in the Central Highlands (Fig. S7A). Further, the Bantu contribution to this population was limited (∼32% based on genome-wide SNPs, 8.8% for maternal lineages, and 55% for paternal lineages). Because the effective population size reflects the number of breeding individuals in a population, this decrease is not necessarily representative of a dramatic event such as disease or famine but instead likely reflects a demographic event such as the migration of a small founding population. It appears that there was a late founder effect in the settlement of the Central Highlands by a small number of individuals (mainly with genetic ancestry from Borneo) while admixture was happening across the rest of the country. Our study shows a strong correlation between geography and genomic diversity across Madagascar. To be sure, the genetic groups we identified are based on arbitrary criteria, and there is no method for identifying the “true” number of genetic groups. However, the distributions of the 10 genetic groups we analyzed are strongly influenced by geography, suggesting that they reflect in some sense the past demographic history of the Malagasy. Interestingly, many of these genetic groups overlap populations presented in the various controversial ethnographic descriptions made by explorers and other scholars in the 20th century (38, 39), and those descriptions may, in turn, reflect the influence of ancient kingdoms across Madagascar (40). In agreement, our study attests that the genetic structure is young and not necessarily due to the result of different population sources. Undoubtedly other factors have influenced the genetic structure of Madagascar; nevertheless, our study shows that in the few centuries since admixture, these factors have produced a subtle but nonetheless detectable structure in the Malagasy that is independent of the African/Asian admixture, even despite the higher levels of internal migration reported during the last century (Fig. 6) (38).
|
|
|
Post by Admin on Apr 16, 2024 2:41:45 GMT
MATERIALS AND METHODS Sampling. The samples analyzed in this study were collected during 2007–2014 with ethical approval by the Human Subjects’ Ethics Committees of the Health Ministry of Madagascar and by French committees (Ministry of Research, National Commission for Data Protection and Liberties and Persons Protection Committee). Individuals were given detailed information about the study, and all gave written consent before the study. DNA was purified from saliva using the Oragen Kit (DNA Genotek Inc.). The extensive sampling was based on a grid sampling approach, in which 82 “spots” 50 km in diameter were placed all over Madagascar (taking into account population density data), and three to four villages were sampled in each spot (Fig. 1). Sampled villages were founded before 1900, and sampled individuals were 61 ± 15 y old, with the maternal grandmother and paternal grandfather born within a 50-km radius of the sampling location. Subjects were surveyed for current residence, familial birthplaces, and a genealogy of three generations to establish lineage ancestry and to select unrelated individuals. A total of 2,704 individuals from 257 villages were sampled (10.5 ± 3.5 individuals per village). Global Positioning System locations were obtained during sampling. Uniparental Markers. Whole mtDNA genome sequences were obtained from 2,691 individuals from 256 villages (10.5 ± 3.5 individuals per village). Multiplex sequencing libraries were constructed and enriched for mtDNA sequences as described previously (41). A double-indexed Illumina sequencing library, with barcodes specific for each sample, was prepared from each extract. Up to 250 libraries were pooled in an equimolar ratio, and mtDNA sequences were enriched via in-solution capture (42). The capture-enriched library pools then were pooled in an equimolar ratio into a single pool, which then was sequenced on eight lanes on an Illumina HiSeq2000 platform with 95-bp paired-end reads. Base-calling was performed using freeIbis (43). Reads then were mapped to the revised Cambridge reference sequence (rCRS) (44) and assembled as described previously (41). Duplicate reads were removed along with reads with a mapping quality score lower than 20 and a base quality score lower than 20. The average coverage per position per individual was 543.2 ± 9.2 reads. The dataset is available from GenBank (accession nos. MF055747-MF058597). Samples then were aligned with Clustal to the rCRS. Haplogroups were assigned to consensus sequences for each sample with the HaploGrep webtool (45) and PhyloTree Build 15 (46). For subsequent analysis only sequences lacking gaps and with a minimum coverage of 15× per position were retained (i.e., 2,409 genomes). For all analyses except haplogroup assignment, the poly-C regions (positions 303–315 and 16,182–16,193) were removed from all sequences. To reconstruct the M23 phylogeny, other sequences belonging to haplogroup M23 reported in PhyloTree Build 15 (46) were also aligned, and a maximum parsimony tree was constructed based on all positions and the MJ algorithm (47); the root age of the haplogroup was computed using the ρ statistic and a mutation rate of one synonymous mutation per 7,884 y (48). Y chromosome haplogroups were determined for 1,554 male individuals (6.7 ± 2.6 individuals per village) by following a previously described method (30). Briefly, 96 binary markers (all located on the nonrecombining region of the Y chromosome) (Dataset S3) were analyzed with a high-throughput genotyping system (nanofluidic Dynamic Array; Fluidigm), and the results were analyzed using the BioMark HD system (Fluidigm), which integrated the real-time PCR Analysis software. Each haplogroup was assigned according to the updated Y-PhyloTree (49).
Estimating Population Sources from the Genome-Wide Dataset. To identify the ancestral source populations and to estimate the admixture fractions of the Malagasy population, we performed a structurelike analysis using the Admixture software (25) after thinning the marker sets for linkage disequilibrium. We used the Plink software to remove each SNP with an r2 value greater than 0.1 with any other SNP within a 50-SNP sliding window (advanced by 10 SNPs each time) (50). Admixture was run using the projection mode, i.e., by projecting the Malagasy individuals onto the reference dataset. We performed these analyses using three reference datasets (Dataset S1), keeping only overlapping sets of compatible SNPs after correcting for strand consistencies: (i) a worldwide analysis based on the Centre d’Étude du Polymorphisme Humain Human Genome Diversity Panel (CEPH-HGDP) (51) and the1,000 Genomes Project (52) populations, to which we added several African populations (6,637 SNPs after pruning; populations are listed in Dataset S1); (ii); an analysis focused on the Asian ancestry using the Pan-Asian dataset (53) (12,689 SNPs after pruning) along with one east African population from the 1,000 Genomes Project; and (iii) a high-density panel with more SNPs (high-density 1) with the 1,000 Genomes Project populations, Khoisan-speaking African populations, and a high-density Indonesian dataset (54) (184,658 SNPs before pruning and 75,410 after pruning; populations are listed in Dataset S1). Admixture results at k = 3 from the high-density panel produced a clear separation between African, East Asian, and West Eurasian populations (Fig. S2); we therefore used the results for k = 3 to estimate each ancestry level across Madagascar. The geographic distribution of African and Asian ancestry was analyzed by computing Moran’s I using the Analysis of Phylogenetics and Evolution (ape) package from R (55) and gradient plots computed using the exponential kriging model in the package geoR (56). Population Structure. The genome-wide dataset was generated for 700 individuals from 253 villages (2.8 ± 0.7 individuals per village) using the Illumina Human Omni 2.5-8 (Omni 2.5) BeadChip array. Analyses were performed using Plink 1.9 (57). All genotyped individuals passed the quality filters, i.e., had genotype call rates higher than 95%, and were not close relatives (identity by descent estimation under the threshold of 0.25). Analyses were performed on 2,268,323 SNPs. The dataset is available from the European Genome-Phenome Archive (ega-box-658). Population structure was analyzed using the fineSTRUCTURE approach (26). The first step of this method (ChromoPainter) examines each segment of the autosomal genome of one individual and determines which specific individual in the rest of the population shares the most homologous fragment. By assuming that the number and the size of shared fragments between two individuals depend on the ancestors shared by these two individuals, this step provides a coancestry matrix between all pairs of individuals. For this purpose the autosomal haplotypes were inferred, and IBD was searched for all individuals by phasing using Beagle version 4.1 (58) (ibdlod = 3; ibdtrim = 40, Grch37 genetics maps) followed by analysis with the ChromoPainter program (26). Following the authors’ instructions, we first ran ChromoPainter on chromosomes 3, 7, 8 and 10, weighting each chromosome by their relative size, on a subset of individuals and using 10 iterations of the expectation-maximization algorithm to infer the genome-wide average switch and global emission rates. Then, using these inferred values, we ran ChromoPainter on all individuals and chromosomes to produce the counts and lengths of fragments shared between individuals.
In the second step we ran the fineSTRUCTURE program on the coancestry matrix based on counts of shared fragments. fineSTRUCTURE is a model-based statistical algorithm that uses a Markov chain Monte Carlo (MCMC) approach (26). Initially all individuals were set into a single cluster at iteration 0. Following 10 million burn-in iterations, we sampled values every 10,000 iterations for 10 million MCMC iterations. At the end, fineSTRUCTURE provided 61 clusters of individuals and the cluster membership of each individual. Then similar clusters were merged hierarchically to give a tree, which can be used to describe population structure at different levels. Finally, we improved individual clustering, as described elsewhere (27). The tree should not be seen as a phylogenetic tree, and all levels of the tree are informative (27). We defined clusters for further analysis as the highest-level monophyletic groups with less than 100 individuals, thus leading to 10 genetic groups with sample sizes of at least 50. The geographic distribution of each of these genetic groups was analyzed by computing Moran’s I statistic, which measure the spatial autocorrelation, using the R package ape (55). The geographic distribution of genetic groups can be considered as post hoc evidence of true clustering (27).
Gradients of the distribution of each genetic group were computed in R using the exponential kriging model in the package geoR (56); all gradients were merged into a single figure. For this purpose, each location on the final map was colored by the color of the main cluster, and the color was attenuated if the principal cluster represented less than 50% of the individuals. All graphs were produced with the ggplot2 package (59). Fst values were computed between each genetic group using Plink (57).
We tested if the genetic admixture was significantly different across the genetic groups using ANOVA and the Tukey HSD statistic from the package stats in R.
|
|