Genomic Data Reveal a Complex Making of Humans

new

Admin
Administrator

Posts: 81,448

Genomic Data Reveal a Complex Making of Humans Dec 6, 2014 15:39:39 GMT

Quote

Post by Admin on Dec 6, 2014 15:39:39 GMT

A 2012 study done on the mummified remains of Ramesses III and his son determined that both y-chromosomes belonged to Haplogroup E1b1a (Y-DNA). The pharaoh’s y-chromosome belongs to the most frequent haplogroup among contemporary Sub-Saharan y-chromosomes. Also anyone who dares deny Ramesses III having E1b1a should note that E1b1a is found in Sudan at 20%… Also presence of severe sickle cell found in the mummies would strongly suggest its (E1b1a’s, that is) presence is not a result of more recent events.

They found Benin sickle cell in Ancient Egyptian mummies, after all: “We conducted a molecular investigation of the presence of sicklemia in six predynastic Egyptian mummies (about 3200 BC) from the Anthropological and Ethnographic Museum of Turin. Previous studies of these remains showed the presence of severe anemia, while histological preparations of mummified tissues revealed hemolytic disorders.” – Marin et. al. 1999, Use of the Amplification Refractory Mutation System (ARMS) in the Study of HbS in Predynastic Egyptian Remains.”

Abstract
Objective To investigate the true character of the harem conspiracy described in the Judicial Papyrus of Turin and determine whether Ramesses III was indeed killed.

Design Anthropological, forensic, radiological, and genetic study of the mummies of Ramesses III and unknown man E, found together and taken from the 20th dynasty of ancient Egypt (circa 1190-1070 BC).

Results Computed tomography scans revealed a deep cut in Ramesses III’s throat, probably made by a sharp knife. During the mummification process, a Horus eye amulet was inserted in the wound for healing purposes, and the neck was covered by a collar of thick linen layers. Forensic examination of unknown man E showed compressed skin folds around his neck and a thoracic inflation. Unknown man E also had an unusual mummification procedure. According to genetic analyses, both mummies had identical haplotypes of the Y chromosome and a common male lineage.

Conclusions This study suggests that Ramesses III was murdered during the harem conspiracy by the cutting of his throat. Unknown man E is a possible candidate as Ramesses III’s son Pentawere.

www.bmj.com/content/345/bmj.e8268

Admin
Administrator

Posts: 81,448

Genomic Data Reveal a Complex Making of Humans Dec 20, 2014 11:50:20 GMT

Quote

Post by Admin on Dec 20, 2014 11:50:20 GMT

Geographic distribution of M1

Figure 2 shows the reduced median network obtained from the 261 M1 haplotypes found in a global search comprising more than 38,713 HVSI sequences. In Africa, haplogroup M1 has supra-equatorial distribution (see additional files 1 and 2). As previously reported its highest frequencies and diversities (Table 2) are found in Ethiopia in particular and in East Africa in general. Two appreciable gradients exist. Frequencies significantly diminished from East to West and also going South to sub-Saharan areas. M1 is not uncommon in the Mediterranean basin showing a peak in the Iberian Peninsula. However, it is rare in continental Europe. Although in low frequencies, its presence in the Middle East has been well established from the South of the Arabian Peninsula to Anatolia and from the Levant to Iran. The central HVSI haplotype (16129–16189–16223–16249–16311) has been found only once in northwestern India [27]. Another possible Indian M1 candidate is the derived sequence: 16086–16129–16223–16249–16259–16311 [28]. However, in two recent studies in which 24 [24] and 56 [25] Indian M complete sequences were analyzed no ancestral M1 lineages have been found. M1 haplotypes have also been occasionally spotted in the Caucasus and the Trans Caucasus [23,29] and in Central Asia [30]. It seems that, going east, M1 even reached the Tibet as the HVSI diagnostic motif was sampled there [31]. However, although haplotypes sharing four of the five HVSI transitions defining M1 (16129–16223–16249–16278–16311–16362; 16129–16223–16234–16249–16311–16362) have been sampled in Thailand and Han Chinese [32,33], complete sequencing have unequivocally allocated them in the D4a branch of D, the most abundant haplogroup representing M in East Asia. As commented previously, this is a clear example of the danger of establishing affinities between geographically distant areas only on the basis of HVSI homologies as, often, they are the product of geographic isolation and molecular convergence [18]. Within this sparse but geographically wide range of M1 distribution its three identified branches also had uneven radiations. Although M1a (HVSI identified by the 16359 transition) is present in all the M1 range, its greatest frequencies and diversities are found in Ethiopia and eastern Africa (Table. 2), pointing to this area as the most probable origin of the M1a expansion in all directions, with particular incidence in western Asia and sub-Saharan Africa. Not all the M1b lineages can be HVSI identified; however, several specific subclades have different locations. Those characterized by transitions 16260–16320 [21], and by presence of 16182 transition and 16265C transversion [22] are restricted to Ethiopia with occasional spreads to eastern Africa. In addition, there is an M1b branch, identified by 16185 transition and 16190 deletion that has a northwestern distribution excepting a Jordan haplotype (Fig. (Fig.2).2). Despite that M1c cannot be unequivocally defined by transition 16185, it can be stated that M1c is an overwhelmingly Northwest African clade which spreads to the Mediterranean and West sub-Saharan Africa areas. Finally, other unclassified M1 branches have also different geographic ranges. Those identified by the presence of 16357 transition and by the reversion of the diagnostic position 16129 are of Ethiopian eastern Africa adscription, while clusters characterized by loss of the diagnostic position 16223 and by the 16399 transition have a northwestern distribution (Fig. 2). However, M1 assignation of haplotypes, which lack any of the basic positions, based only on HVSI information is risky when they share other diagnostic positions with different haplogroups. For instance, the Russian haplotype 16183C–16189–16249–16311, classified as M1 on the basis of its HVSI sequence [34] also matches with haplotypes assigned to the U1a clade [35].

The presence in the Mediterranean basin and in West sub-Saharan Africa of M1a and M1c lineages can be taken as proof that these areas received influences both from the West and East North African centers of M1 radiation. Quantitative confirmation of the above described patterns are provided by AMOVA and pairwise distances based on FST analyses using the groups and populations described in Material and Methods and taking into account haplotypic molecular differences. As usual the bulk of the variation, 90%, is within populations, 6% is due to differences among groups and 4% to differences among populations within groups. Pairwise differences between populations (Table. 3) offer a more detailed view. There is homogeneity between populations within eastern Africa, small differences (p < 0.05) within western Africa and strong heterogeneity between these main areas (p < 0.001). On the contrary, Iberian Peninsula has significant differences with the rest of Europe. In turn, West Asia conforms an homogenous continuum with East Africa and Europe excepting Iberian Peninsula and the latter is not significantly different of western Africa. All these results can be explained as due to the differential radiation of M1a from East Africa and M1c from Northwest Africa, the Iberian Peninsula being mostly influenced by Northwest Africa and the rest of Europe and western Asia by East Africa.

M1 haplotypes in Jews

Several M1 haplotypes have been detected in Jewish communities albeit in low frequencies [36,37]. However, when compared with non-Jew populations they show significantly higher frequencies for the whole M1 haplogroup (p = 33.54***) and for M1a in particular (p = 24.90***). The only striking exception is that of the Moroccan Jews for which no M1 lineages have been detected at all [36]. Interestingly, all M1 lineages found in Jews, except two, belong to the eastern clade M1a (Fig. (Fig.2).2). Therefore, as for the bulk of the M1 Near East haplotypes, the most probable origin of these Jewish M1 lineages is the result of an eastern African expansion around 5000 years ago. Another peculiarity of M1 in Jewish communities is its reduced haplotypic diversity (Table. 2) which has been already detected for other mtDNA lineages [36,38]. In addition, there is a strong M1 geographic differentiation among Jewish communities. For example, all European Ashkenazi Jews have only one M1a lineage characterized by a transition in the 16289 position that has not been detected in other Jew or non-Jew populations. Similarly, all West Asian Jews shared an identical M1a motif characterized by a transition in the 16209 position that has been detected only once in Ethiopia. These results are congruent with the proposition that, in the majority of the cases, Jewish migrations implied strong maternal founder effects [36-38]. Nevertheless, as M1a Jewish lineages are unique and different in different groups, we think that its source Near East population should not suffer strong genetic bottlenecks. Finally, it is worth mentioning that M1 frequencies of Jewish groups and their host populations are significantly correlated (r = 0.942**) which suggests that some genetic interchange must have happened between them as already proposed by others authors [36,37].

Radiation ages and evolution of lineages

Radiation ages for M1 and its subhaplogroups have been estimated on the basis of complete coding and HVSI sequences using different mutation rate estimations (Table.4). The ages obtained for M1 and M1a from HVSI data are more coherent with those calculated for the coding region using the Ingman et al. [6] mutation rate than that proposed by Mishmar et al. [8]. Our coalescence age estimations for the whole M1 clade (20,000–30,000 years) are younger than those previously published [22]; however, the approximate expansion ages for the eastern Africa M1a subclade (10,000–20,000 years) are in the same range. Although standard errors overlap, it seems that the northwestern Africa expansion represented by M1c subclade (19,040 ± 4916 years), preceded the M1a eastern Africa expansion (16,756 ± 5997) M1b being the youngest branch (10,155 ± 3590). It must be stated that coalescence ages are only rough estimations biased by mutation rate estimations, small sample size, demographic history and, possibly, selection. There are recent examples of clock-like evolution violations in several mtDNA lineages that have been explained by selective or demographic effects [39-41]. Here, subclade M1a2 (Fig. (Fig.1)1) represents a new example of constant mutation rate violation. The mean number of substitutions accumulated in M1a2 lineages (12.5 ± 0.7) is significantly higher (p = 0.008) than that in the rest of M1 lineages (8.4 ± 1.3). This result is not compatible with a uniform rate of evolution. The small standard errors show that there is high lineage homogeneity within groups, which weakens the possibility that stochastic processes have played a main role. Different patterns of synonymous and nonsynonymous changes among different lineages have been taken as hints of a role for selection in other studies [8,39]. In our case differences between synonymous vs. nonsynonymous changes within groups does not reach statistical signification (p = 0.75). However, the mean number of coding region substitutions accumulated in M1a2 lineages (11 ± 0.0) is significantly higher (p < 0.001) than in the rest of M1 (5.6 ± 0.7). Conversely, the mean number of regulatory region substitutions accumulated in M1a2 lineages (1.5 ± 0.7) is smaller than in the rest (2.8 ± 0.9) although not reaching statistical significance (p = 0.175). If the mutation rate was constant along the whole mtDNA molecule, for each mutation in the regulatory region roughly fourteen mutations should accumulate in the coding region. However, selection pressure is higher in the coding than in the regulatory region so that the substitution rate is ten times faster in the latter. The mean coding/regulatory ratio is 8.3 for M1a2 lineages and only 2.4 for the rest of M1. We interpret these results as due to different ages of expansion between clades. M1a2 would be the youngest clade with a more recent expansion than the others so that purifying selection has not had enough time to eliminate mutations with small deleterious effects in the coding region. We think that differences in the rate of evolution among subgroups of the North African U6 haplogroup [40] could be better explained by the same pattern assuming that the U6a subclade, with the highest coding/regulatory ratio, had a more recent radiation than the U6b subclade. In spite of its anomalous behavior, M1a2 has only a minor effect on the estimation of the whole M1 coalescence age although its omission significantly diminishes that of the M1a subgroup (Table (Table44).

Phylogeographic parallelism between M1 and U6 haplogroups

There are striking similarities between the geographical dispersals and radiation ages observed here for M1 lineages and those previously published for the North African U6 haplogroup [40]. It was proposed that U6a first spread was in Northwest Africa around 30,000 ya. Coalescence ages for M1 also fit into this period and the oldest clade M1c has an evident northwestern Africa distribution; however it had to have a wide geographic range as some M1c lineages are today still present in Jordanians (Figs.11 and 2). It is curious that this prehistoric Near Eastern colonization was also pointed out by the uniqueness of the U6a haplotypes detected in that area. A posterior East to West African expansion around 17,000 ya was indicated by the U6a1 relative diversity and distribution. Again, age, relative East to West diversities and geographic range accurately correspond with the M1a1 expansion detected here. More recent local spread of lineages U6b and U6c also parallel the M1b and M1c1 distributions. Furthermore, these similarities also hold outside Africa. U6 lineages in the Iberian Peninsula have been considered traces of northward expansions from Africa. Based on the uneven distribution of U6a and U6b lineages in Iberia, with the former predominating in southern and the latter in northern areas, it was proposed that U6b in Iberia represents a signal of a prehistoric North African immigration whereas the presence of U6a could be better attributed to the long lasting historic Arab/Berber occupation [40]. Again, this pattern is accurately repeated by the M1c and M1a distribution in the Iberian Peninsula, the northwest African M1 being more abundant in northern areas (56%) and the East African M1a in southern areas (85%) although, due to the small sample size, difference does not reach a significant level (p = 0.07). Additional support to the hypothesis of a prehistoric introduction are the recently detected presence of a Northwest African M1c lineage in a Basque cemetery dated to the 6th–7th centuries AD, prior to the Moorish occupation [42], and the ancestral phylogenetic position of another Basque M1d sequence (Fig. 1) that does not match any African sequence. Finally, two autochthonous U6 lineages (U6b1 and U6c1) traced the origin of the Canary Islands prehispanic aborigines to Northwest Africa [43]. Although exclusive M1 lineages have not been detected in the Canary Islands, it is worth mentioning that those sampled belong to the Northwest African area [44]. Outside Africa and the Iberian Peninsula, as with U6, M1 has been mainly detected in other Mediterranean areas with main incidences in islands such as Sicily. It is customary to attribute these incidences to the above mentioned Arab/Berber historic occupations. However, taking into account the major Jewish assignation for all the M1a haplotypes detected in Europe, the possibility of a Jewish maternal ascendance for at least some of these lineages should not be rejected.

Note that the two M1 lineages sampled in the Balearic isles were of Jewish adscription [45]. Also, there were well documented Jewish settlements in Sicily since early Roman times [46] and, coincidentally, half of the M1 lineages sampled in that island [47,48] belong to the M1a cluster. Finally, the Atlantic archipelagos of Canaries and Madeira, where the rigor of the Spanish Inquisition was stronger, only have M1c representatives. In contrast, in the Azores Islands, that were used as a refuge by Sephardim Jews expelled from the Iberian Peninsula, half of the M1 sequences detected are of M1a assignation [49,50]. These possible Jewish contributions might be also extended to the U6 lineages of eastern origin because all U6 haplotypes detected in Ashkenazim and other Jewish groups, excepting one that is a basal U6a (16172–16219–16278), belong to the eastern Africa clade U6a1 [36,26]. An additional proof of the striking parallelism between M1 and U6 lineages is the fact that, as for M1, no U6 representatives were sampled in Moroccan Jews in spite of the high frequency of this clade in the Moroccan and Berber host populations [36].

Most probable origin of M1 ancestors

Mitochondrial M lineages in Ethiopia were first detected by RFLP analyses [51]. To explain its presence in that area the authors suggested two possibilities: 1) the marker was acquired by Ethiopians through interchanges with Asians or 2) it was present in the ancient Ethiopian population and was carried to Asia by groups who migrated out of Africa. Later, the second hypothesis was favored and a single origin of haplogroup M in Africa was suggested, dating the split between Asian and African M branches older than 50,000 ya [22]. Although not completely discarding this last scenario other authors considered that the disjunctive was unsettled. The vast diversity of haplogroup M in Asia compared to Africa pointed to the possibility that M1 is a branch that traces a backflow from Asia to Africa [7,23]. Due to the scarcity of M lineages in the Near East and its richness in India, this region was proposed as the most probable origin of the M1 ancestor [7,52]. However, recent studies based on Indian mtDNA sequences [24,25] have not found any positive evidence that M1 originated in India. Nevertheless, the inclusion of M1 complete mtDNA lineages in the construction of the macrohaplogroup M phylogeny clearly established that the antiquity of Indian lineages, as M2, as compared to Ethiopian M1 lineages support an Asian origin of macrohaplogroup M [24]. Furthermore, the comparison within Africa of eastern and western M1 sequences left the origin of M1 in Africa uncertain [21]. On the light of our and other authors results, it seems clear that by their respective coalescence ages and diversities, M1 is younger than other Asiatic M lineages. Although it is out of doubt that the L3 ancestor of M had an African origin, macrohaplogroup M radiated outside Africa and M1 should be considered an evolved branch that signals its return to this continent. Even more, as the coalescence ages of the northwestern M1c clade is older than the eastern M1a clade, we think that the most ancient dispersals of M1 occurred in northwestern Africa, reaching also the Iberian Peninsula, instead of Ethiopia. The detection of an ancestral M1c sequence in Jordanians could be explained by two alternative hypotheses: 1) that the Near East was the most probable origin of the primitive M1 dispersals, West into Africa and East to Central Asia. This supposition would explain the presence of basic M1 lineages, instead of the most common M1a derivates, as far as the Tibet. The actual scarcity of these types in eastern areas could be explained by posterior migrations that erased these primitive lineages. The absence of these ancestral M1c lineages in Ethiopia would point to the Sinai Peninsula as the most probable gate of entrance of this backflow to Africa. 2) That M1 is an autochthonous North African clade that had its earliest spread in northwestern areas marginally reaching the Near East and beyond. This would explain the shortage of basic M1 lineages in the Near East but would leave the Asiatic origin of the M1 ancestor undetermined. In any case, both alternatives envisaged M in Africa as an offshoot of the Asiatic M trunk. The striking phylogeographic parallelism between U6 and M1 haplogroups adds additional support to these hypotheses. It is possible to correlate the dispersion ages of the different M1 clades with their contemporary climatic, archaeological, paleoanthropological and linguistic information. For instance, the first M1 backflow to Africa, dated around 30,000 ya, is coincidental with a harsh glacial period which suggests that this human retreat to Africa could be forced by climatic conditions. The low sea level in the Gibraltar Strait at that time could also facilitate the Iberian Peninsula colonization. The northwestern African M1c and the probable north central M1b expansions are coincidental with the Iberomaurusian and Capsian industries. The anomalous evolution of M1a2 lineages left the coalescence ages of the eastern Africa M1a expansion uncertain, but as suggested for the sister U6a1 radiation; these movements could be correlated in time with an African origin and expansion of Afroasiatic languages [40]. Finally, from a maternal genetic perspective it seems that Neolithic occupation of the Sahara had both eastern and western influences. Most probably other mtDNA lineages participated in this human back flow to Africa. It has been suggested that the North African X1 branch of the Euroasiatic haplogroup X could be one of them [63].

Whilst this paper was under review, a new paper also dealing with U6 and M1 haplogroups was published [53]. Haplogroup topologies and phylogeographic conclusions proposed by Olivieri et al. [53] are highly coincidental with those proposed by us in our previous paper on U6 [40] and in the present paper, dealing with M1. Regrettably, there are differences in nomenclature for M1. Whereas our M1 phylogeny adhered to that proposed previously by other authors [21], Olivieri et al. [53] chose to apply their own. Nevertheless, the diagnostic positions for the different M1 subhaplogroups allowed us to establish subhaplogroup homologies between the two works. Clearly their M1b subgroup (defined by transition 13111) corresponds to our M1c subgroup; their M1a2 subgroup (defined by transition 15884) corresponds to our M1b subgroup. Finally, their M1a1 subgroup (defined by transitions at 3705, 12346 and 16359) corresponds to our M1a subgroup. In addition to the reinforcing overlap of ideas, it is worthwhile mentioning the high coincidence for the coalescence ages of M1 and the majority of its subhaplogroups, when the same substitution rate [8] is used. Olivieri et al. [53] calculated a coalescence time estimate of 36.8 ± 7.1 ky for the entire haplogroup M1 that matches our estimate of 35.2 ± 7.1 ky. Our coalescence time for M1c (25.7 ± 6.6 ky) also overlaps with Olivieri et al. [53] haplogroup M1b (23.4 ± 5.6 ky). Likewise, the coalescence age calculated for our M1a subhaplogroup (22.6 ± 8.1 ky) is in the range of the Olivieri et al. [53] estimation for their M1a1 subhaplogroup (20.6 ± 3.4 ky). The only discrepancy is about the coalescence time estimate between our M1b subhaplogroup (13.7 ± 4.8 ky) that is younger than that calculated by Olivieri et al. [53] for their homologous M1a2 (24.0 ± 5.7 ky). As our calculations are based only on three lineages and that of Oliveri et al [53] on six, we think that their coalescence time estimation should be more accurate that ours. In fact, when time estimation is based on the eight different lineages (AFR-KI43 is common to both sets) a coalescence age of 20.6 ± 5.0 ky is obtained. Although with overlapping errors, these results, together with the relative ancestral positions of each subgroup in the phylogenetic tree (Fig. 1), would suggest that the northwestern M1c clade radiation was older than those for the ubiquitous M1b and the eastern M1a clades, as also proposed by Olivieri et al. [53].

González, Ana M., et al. "Mitochondrial lineage M1 traces an early human backflow to Africa." BMC genomics 8.1 (2007): 223.

Admin
Administrator

Posts: 81,448

Genomic Data Reveal a Complex Making of Humans Dec 22, 2014 14:16:21 GMT

Quote

Post by Admin on Dec 22, 2014 14:16:21 GMT

Within Europe, Haplogroup N is frequent among Finnic-speaking groups (40% on average), whereas its frequency in northern Russians is 35%, and Haplogroup N was introduced to Japan by the ancient Jomon settlers from Siberia who were closely related to the Ainu and its frequency is higher in northern regions (7.7% in Aomori) where the Ainu historically inhabited. Haplogroup D is a very archaic Asian lineage (50,000-60,000 years BP), which is present in native Siberian populations, and it was also detected in Southern Altaians living in the Siberian Altai at moderate frequencies (10-14%) and the Ainu (17.6%).

Fig. 1 Phylogenetic tree of 22 Y chromosome binary polymorphisms analyzed in this study. Marker names are indicated above the lines. SNPs are indicated by red letters in System 1, blue letters in System 2, and green letters in System 3.

We selected 22 SNPs from the non-coding regions of the Y chromosome using the phylogenetic tree of Y chromosome haplogroups, focusing on Japanese groups (Fig. 1). Each primer set was designed using Primer3Plus software (http://primer3plus.com/) to generate amplicons (including each SNP) of ⩽150 bp by setting each primer binding site close to SNP. Each primer was checked for the potential self-dimer structures using AutoDimer software (http://www.cstl.nist.gov/biotech/strbase/AutoDimerHomepage/AutoDimerProgramHomepage.htm). We checked each PCR primer set by agarose gel electrophoresis to confirm that each product was peculiar to male DNA and confirmed the allele typing of single base extension products by DNA sequencing with a 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).

We ran three mini multiplex PCR systems. System 1 (undecaplex M15, RPS4Y711, M231, P31, P191, M119, IMS-JST021355, M242, P99, M179, and M122) roughly subdivided the Japanese population into haplogroups C, D, D1, D2, D3, O, O1a, O2, O3, N, and Q. System 2 (octaplex IMS-JST022457, M116.1, M125, P151, P120, P42, M179, and P12) further subdivided haplogroup D2, while System 3 (pentaplex SRY465, M95, P31, M88, and PK4) further subdivided haplogroup O2. Each assay was performed using a GeneAmp PCR System 9700 (Applied Biosystems) in 9600 emulation mode with a final volume of 15 μl. We used a Qiagen Multiplex PCR Plus Kit (Qiagen) and 1 ng of genomic DNA for the assay. Supplementary data 1 shows each primer sequence and its concentration. The cycling programs consisted of pre-denaturation at 95 °C for 5 min, followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 60 °C for 90 s, extension at 72 °C for 30 s, and a final extension at 68 °C for 15 min.

Fig. 2 Gel electrophoresis of degraded DNA series by Dnase I digestion. M, 100 bp marker, lane 1, control (no digestion with Dnase I); 2, digestion with Dnase I for 2 min; 3, 5 min; 4, 10 min; 5, 30 min; 6, 60 min; 7, 90 min; 8, 120 min.

To assess the effectiveness of our three mini Y chromosome SNP multiplex systems in genotyping degraded DNA, we used artificially degraded DNA (digested with DNase) and forensic samples characterized by poor DNA quality. We prepared the artificially degraded DNA sample series as described previously [17]. An aliquot of 11.2 μg of male genome DNA was mixed with 10× DNase I Reaction Buffer (Invitrogen, Carlsbad, CA, USA) and sterile water to create a total volume of 110 μl. From this reaction mixture, we removed 10 μl as control DNA undigested by DNase and added 250 mU of DNase I (Invitrogen) to the remaining 100 μl volume. We then removed 10 μl aliquots from the 100 μl mixture at 2, 5, 10, 30, 60, 90, and 120 min. The 10 μl aliquots removed were mixed with 2 μl of 25 mM EDTA at 75 °C for 10 min. The samples were mixed with 2 μl of 25 mM EDTA at 75 °C for 10 min. The control DNA was mixed with EDTA in the same manner. From the final volume of 12 μl, we used 2 μl for 10% polyacrylamide gel electrophoresis to check DNA fragmentation (Fig. 2). The DNA samples were analyzed with the three mini Y chromosome SNP multiplex systems and AmpFLSTR Yfiler Kit (Applied Biosystems). An aliquot (2 μl) of DNA solution from each degraded sample was used in the reaction mixture for PCR. Based on analysis results, we identified 30 samples of degraded DNA for which allele typing was unsuccessful for >7 of the 16 loci. We extracted DNA from skeletal remains samples by SDS-proteinase K treatment followed by phenol/chloroform extraction. We performed analyses using the three mini Y chromosome SNP multiplex systems and the AmpFLSTR Yfiler Kit as described above to determine whether the systems were suitable for effective analysis of degraded DNA.

We established three mini Y chromosome SNP multiplex systems using 22 Y chromosome binary markers to identify 23 haplogroups in the Japanese population. Sensitivity studies detected allele peaks at >150 relative fluorescence units. In investigating template DNA concentrations with System 1, we observed several additional peaks with 50 pg of template DNA. Interpretation of analyses with 50 pg of template DNA in Systems 2 and 3 proved difficult due to low peaks. To avoid mistyping attributable to extra peaks, we set the low template level between 50 and 100 pg. While allele typing was successful in the group with ⩾5 ng of template DNA, the target peaks were too high, and extra peaks were observed. Thus, we set the maximum template level at <2 ng. Typing proved possible for all samples with template DNA amounts between 100 pg and 2 ng, and no significant extra peaks were observed. Within these limits established by DNA detection range analysis, allele typing for all selected SNPs proved successful with each system. Fig. 3 shows the results for DNA from 9948 DNA (Promega, Madison, WI) obtained using our systems. When SNP analysis was performed using female DNA as a template or negative control, no PCR bands were detected. Non-expected peaks were occasionally visible, but these peaks did not affect SNP evaluations.

Fig. 3 Electropherograms for 9948 DNA obtained using the present SNP systems.

Most of the Japanese population can be classified using these three mini Y chromosome SNP multiplex systems. Table 1 shows the frequency for the Japanese population. Mutations RPS4Y711 (haplogroup C), IMS-JST021355 (haplogroup D), and P191 (haplogroup O), respectively, were 8.3%, 30.3%, and 59.0%, haplogroup frequencies similar to those found in past studies [[18], [19], [20], [21]]. Mutations M231 (haplogroup N) and M242 (haplogroup Q) were rarely found in this study. Using Systems 2 and 3, we subdivided populations of haplogroups D2 and O2. In this survey, haplogroup D2a1b (16.2%) was the most frequent in Japanese haplogroup D populations and haplogroup O2b (32.2%) the most frequent in the haplogroup O population. The haplogroup frequencies observed in haplogroup D2 and O2 were similar to those reported in previous studies [[19], [20]].

We also investigated the effectiveness of our systems in analyzing degraded samples. We re-analyzed a set of 30 hard tissue samples unsuccessfully examined using the protocol for a commercially available AmpFLSTR Yfiler Kit. This protocol had produced unsatisfactory results for at least 7 of the 16 loci. Fig. 4 shows the results of our analysis of the degraded DNA samples. Only 8 alleles were successfully typed using the AmpFLSTR Yfiler Kit; in contrast, the present systems proved able to detect all alleles and define the haplogroup. Table 2 presents the results of our analysis. The present systems proved capable of classifying 29 of 30 degraded DNA samples previously examined unsuccessfully using the AmpFLSTR Yfiler Kit. We also used the three systems to analyze an artificially degraded DNA sample (Table 3). In tests of degraded DNA digested with DNase, typing had failed for more than half the loci. In contrast, the present systems also proved effective with these degraded samples (Supplementary data 3).

Fig. 4 (a) Electropherograms for degraded DNA sample from a male, extracted from a hard tissue sample and obtained using the AmpFLSTR Yfiler Kit (Applied Biosystems). (b) Electropherograms for degraded DNA samples from a male, extracted from hard tissue sample and obtained using the present SNP systems.

STR and SNP analyses have become essential tools for determining personal identity based on biological samples. Current research is especially active in the area of autosomal and Y chromosome STRs and SNPs [[22], [23], [24], [25], [26], [27]]. Commercially available STR multiplex kits are not specifically manufactured for the analysis of forensic samples; forensic scientists often encounter problems during analysis of degraded DNA. To date, multiplex SNP analysis of degraded DNA samples has not been investigated extensively. We configured three systems to perform simultaneous analysis of biallelic markers on the Y chromosome that classify haplogroups in the Japanese population and began by evaluating the performance of our systems with Japanese haplogroup classification. We applied the newly devised mini Y chromosome SNP multiplex PCR systems to the analysis of samples from 432 Japanese men. The results indicated frequencies of major haplogroups consistent with those found in previous studies [[18], [19], [20], [21]]. For 0.9% of the Japanese population, we failed to discover any mutations using our three Y chromosome SNP analysis systems. These samples appear to belong to haplogroups I and R (1%) [11]. The haplogoup D lineage occurs most frequently in Central Asia and in Japan; the haplogroup D2 lineage is rarely found outside Japan [11]. In this survey, all haplogroup D instances belonged to haplogroup D2, while the frequencies of subhaplogroups D2∗, D2a1∗, and D2a1b showed no significant differences from previous reports and fine classifications, suggesting that System 2 may be very useful in subdividing the Japanese haplogroup D2 population. Where further classification is required, IMS-JST022456 may help define the subclades of haplogroup D2 [20].

Haplogroup O, the most prevalent haplogroup in Japan, was divided by System 1 and further divided by System 3. In System 1, 21.3% of samples branched into haplogroup O3. Using System 3, we demonstrated that haplogroup O2 branched into haplogroup O2b (32.2%). Haplogroups O2b and O3 accounted for more than half the Japanese population. Introducing still another system to subdivide haplogroups O2b and O3 should make it still more useful for personal identification. Reports indicate many individuals in the Japanese haplogroup O2b have the 47z mutation (haplogroup O2b1) [[11], [25]]. Additionally, the Japanese haplogroup O3 can be divided into further subgroups [[27], [28]].

We found that haplogroup O accounted for 59.0% of the samples; haplogroup D for 30.3% of the samples; and haplogroup C for 8.3%. Several studies indicate haplogroups C, D, and O are found in more than 95% of the East Asian population [[18], [28]], but at differing proportions from country to country. Japan features high proportions of haplogroup D, while South Korea features high proportions of haplogroup C [28]. Genetic differences between East Asians are also evident in mitochondrial DNA haplogroups. Mitochondrial DNA is an excellent tool for forensic genetics due to the high copy numbers per cell and maternal inheritance. Certain mitochondrial haplogroups, such as M7a and N9b, occur frequently in the Japanese population but are rarely encountered in other East Asian populations [29]. Using mitochondrial and Y chromosome SNPs, we can exploit these differences to categorize East Asian populations into the appropriate haplogroups.

Personal identification requires further classification; forensic scientists often encounter major difficulties in analyzing degraded DNA samples. Quite often, degraded DNA samples cannot be successfully analyzed using commercially available kits subject to sample volume limitations. In forensic examinations, an additional system capable of fine sub-classification may help. The objective of the present study is to apply these methods to analyze degraded samples for forensic purposes. Allele typing by Y chromosome SNPs analysis is easier than with autosomal or X chromosome SNPs because heterozygosities and systems that detect stimulatory Y chromosome SNP can often predict haplogroups, even with incomplete allele typing. STRs are known to produce stutter artifacts differing from true alleles that may complicate analysis; on the other hand, SNP analysis is very simple.

Our new systems containing 22 Y chromosome SNPs promise effective and efficient analysis of highly degraded DNA samples in the Japanese population. The short amplicons used in this study offer the potential to become the tool of choice for analyzing degraded DNA samples [[17], [30]]. To test these hypotheses, we used amplification product lengths between 77 bp (M122) and 150 bp (M231 and M95) for all Y chromosome SNPs. On this basis, our systems proved capable of generating favorable results with highly degraded DNA samples. With samples for which most STRs could not be analyzed with the AmpFLSTR Yfiler Kit, the systems we created were able to type only a few SNPs, suggesting that amplification is inadequate even with ⩽150 bp amplicons with extensive fragmentation of DNA samples. However, these systems proved effective with samples in which STRs could be detected in >2 loci. Analytical results for artificially degraded samples substituting for highly degraded forensic DNA sources were also superior to those obtained using commercial STR kits. For degraded DNA samples for which alleles were not completely detected, this means these systems can easily determine haplogroups and that even if haplogroups are not determined to precise subgroups, the detected SNPs can help achieve personal identification.

Harayama, Yuta, et al. "Analysis of Y chromosome haplogroups in Japanese population using short amplicons and its application in forensic analysis." Legal Medicine 16.1 (2014): 20-25.

Admin
Administrator

Posts: 81,448

Genomic Data Reveal a Complex Making of Humans Dec 28, 2014 14:43:55 GMT

Quote

Post by Admin on Dec 28, 2014 14:43:55 GMT

Phylogenetic tree based on complete M1 sequences. Numbers along links refer to nucleotide positions. C, G indicate transversions; "d" deletions and "i" insertions. Recurrent mutations are underlined. Star differs from rCRS [62, 63] at positions: 73, 263, 311i, 750, 1438, 2706, 4769, 7028, 8701, 8860, 9540, 10398, 10873, 11719, 12705, 14766, 15301, 15326, 16223 and 16519. Subject origins are: Asian (ASI HER; [54]) and 2 Ethiopians (AFR-KI43 and AFR-KI15; [55]) only analyzed for coding region; Georgian (GEO 2463); Indian (IND-B156; [25]); 2 Jordanians (JOR 771; [7] and JOR 841); 2 Moroccans (MOR 252; [7] and BER 957 = Berber); Saudi Arab (SAU ARA); Serere from Senegal (SER 558); 3 Spanish (Basque = BAS V82, Castilian = CAS 2490, and Valencian = VAL 1881). Doted branches include subjects only analyzed for RFLP and HVI region [22]. Roman numbers refers to the Quintana-Murci et al. [22] nomenclature.

As an outgroup of the M1 genomic phylogenetic tree (Fig. 1) we used a published Indian M30 complete sequence [25]. When this M30 lineage is compared to the rare M sequence previously detected in two Palestinians [26], it is evident that it belongs to the Indian super-clade M4'30, as it shares the basal mutation 12007. More specifically it belongs to the M30 branch because it also has transition 15431. M30 has a broad geographic, ethnic and linguistic range in India. It has been detected in northern and southern India, in Australoid and Caucasoids, and in Dravid and Indo-European speakers [24,25]. So, instead of an autochthonous Near East M lineage, its presence in Palestine is probably due to a recent gene flow from India. After careful re-reading and partial re-sequencing of two previously published M1 sequences [7], we have detected in them the following errors: both have the 12950C transversion, and, in addition M1,1 has the 6671 transition and M1,2 the 13111 transition. Taking these modifications into account, from the M basal type, haplogroup M1 is characterized by one transversion (12950C) and four transitions (6446, 6680, 12403, and 14110) in the coding region and by a five transitions motif (195, 16129, 16189, 16249, and 16311) in the non-coding region (Fig. 1). This haplogroup can be RFLP diagnosed by a MnlI site loss at position 12402. Two main branches, M1c and M1abde, respectively defined by transitions 13111 and 6671, sprout from the root. Based on partial sequences M1c was defined by transition 16185 [21]. However, not all M1c lineages present this mutation that, in addition, recurrently appears in a M1b1 lineage. It seems that for population studies M1c could be better diagnosed by a DdeI site loss using a modified reverse primer (Table .1). It is surprising that none of the three M1c complete sequences have an eastern Africa ancestry: one (Jor771) has a Levantine origin and the other two belong to West sub-Saharan Africa (SER558) and West Mediterranean (VAL1881) areas. The latter two sequences conform a new M1c1 subclade defined by transitions 10895 and 16399 that can be RFLP diagnosed at 10895 position (Table .1). In relation to the M1abde cluster, it is also surprising that one lineage that directly branched out from the root (BER957) has a northwestern, not eastern, Berber ancestry. All the rest of lineages shared the 813 transition forming the M1abd cluster. Again, an isolate offshoot of Basque ancestry (BASV82) sprouts from its root. Subclade M1b was characterized by an RFLP site gain (+15882 AvaII) and loss of -15883 HaeIII [22]. Later an M1b subclade defined by the non-coding motif 16260–16320 and restricted to East Africans was identified [21]. Consistently, none of our M1b sequences from western areas has that motif. The last cluster, M1a, was first distinguished by RFLP +12345 RsaI [22] and, after that, further characterized by transition 16359 [21]. In addition it also has transition 3705 at its root (Fig. 1). M1a is the most prominent clade in eastern Africa. However, its expansion occurred later than the other M1 branches (Fig. 1). An M1a subclade, M1a2, defined by transition 9053, that can be RFLP diagnosed (Table ,1), testifies a posterior spread of M1a to western Asia.

Geographic distribution of M1

Figure Figure22 shows the reduced median network obtained from the 261 M1 haplotypes found in a global search comprising more than 38,713 HVSI sequences. In Africa, haplogroup M1 has supra-equatorial distribution (see additional files 1 and 2). As previously reported its highest frequencies and diversities (Table (Table2)2) are found in Ethiopia in particular and in East Africa in general. Two appreciable gradients exist. Frequencies significantly diminished from East to West and also going South to sub-Saharan areas. M1 is not uncommon in the Mediterranean basin showing a peak in the Iberian Peninsula. However, it is rare in continental Europe. Although in low frequencies, its presence in the Middle East has been well established from the South of the Arabian Peninsula to Anatolia and from the Levant to Iran. The central HVSI haplotype (16129–16189–16223–16249–16311) has been found only once in northwestern India [27]. Another possible Indian M1 candidate is the derived sequence: 16086–16129–16223–16249–16259–16311 [28]. However, in two recent studies in which 24 [24] and 56 [25] Indian M complete sequences were analyzed no ancestral M1 lineages have been found. M1 haplotypes have also been occasionally spotted in the Caucasus and the Trans Caucasus [23,29] and in Central Asia [30]. It seems that, going east, M1 even reached the Tibet as the HVSI diagnostic motif was sampled there [31]. However, although haplotypes sharing four of the five HVSI transitions defining M1 (16129–16223–16249–16278–16311–16362; 16129–16223–16234–16249–16311–16362) have been sampled in Thailand and Han Chinese [32,33], complete sequencing have unequivocally allocated them in the D4a branch of D, the most abundant haplogroup representing M in East Asia. As commented previously, this is a clear example of the danger of establishing affinities between geographically distant areas only on the basis of HVSI homologies as, often, they are the product of geographic isolation and molecular convergence [18]. Within this sparse but geographically wide range of M1 distribution its three identified branches also had uneven radiations. Although M1a (HVSI identified by the 16359 transition) is present in all the M1 range, its greatest frequencies and diversities are found in Ethiopia and eastern Africa (Table (Table2),2), pointing to this area as the most probable origin of the M1a expansion in all directions, with particular incidence in western Asia and sub-Saharan Africa. Not all the M1b lineages can be HVSI identified; however, several specific subclades have different locations. Those characterized by transitions 16260–16320 [21], and by presence of 16182 transition and 16265C transversion [22] are restricted to Ethiopia with occasional spreads to eastern Africa. In addition, there is an M1b branch, identified by 16185 transition and 16190 deletion that has a northwestern distribution excepting a Jordan haplotype (Fig. 2). Despite that M1c cannot be unequivocally defined by transition 16185, it can be stated that M1c is an overwhelmingly Northwest African clade which spreads to the Mediterranean and West sub-Saharan Africa areas. Finally, other unclassified M1 branches have also different geographic ranges. Those identified by the presence of 16357 transition and by the reversion of the diagnostic position 16129 are of Ethiopian eastern Africa adscription, while clusters characterized by loss of the diagnostic position 16223 and by the 16399 transition have a northwestern distribution (Fig. 2). However, M1 assignation of haplotypes, which lack any of the basic positions, based only on HVSI information is risky when they share other diagnostic positions with different haplogroups. For instance, the Russian haplotype 16183C–16189–16249–16311, classified as M1 on the basis of its HVSI sequence [34] also matches with haplotypes assigned to the U1a clade [35].

Reduced median network relating M1 HVSI sequences. The central motif (star) differs from rCRS at positions: 16129 16189 16223 16249 16311 for HVI control region. Numbers along links refer to nucleotide positions minus 16000: homoplasic mutations are underlined, and positions not used in diversity estimations are in italics. The broken lines are less probable links in accordance with completed sequences (Fig. 1) and/or mutation recurrence. Size of boxes is proportional to the number of individuals included. Codes are: NWA = Northwest Africa (ALB = Algerian Berber; ALG = Algerian; MBE = Moroccan Berber; MOR = Moroccan; SAH = Saharan; TNA = Tunisia Arab; TNB = Tunisia Berber); CWA = Central West Africa (GUB = Guinea Bissau; IVC = Ivory Coast; MAL = Mali; SEN = Senegalese); NEA = Northeast Africa (EGY = Egyptian; NUB = Nubian; SUD = Sudanese); CEA = Central East Africa (ETH = Ethiopian; KEN = Kenyan; SOM = Somali); WAS = West Asia (ARA = Arab; ARB = Arab Bedouin; CAU = Caucasian; GEO = Georgian; JOR = Jordanian; IDR = Israel Druze; IND = Indian; IRN = Iranian; KGZ = Kirghiz; NOG = Nogay; PAL = Palestinian; TIB = Tibetan; TUR = Turkish; YEM = Yemeni); IPE = Iberian Peninsula and islands (AZO = Azores; CAI = Canary Islander; CVE = Cape Verde; MAD = Madeira islander; POR = Portuguese; SPA = Spanish); MEU = Mediterranean Europe (CRO = Croatian; CMD = Central Mediterranean; GRE = Greek; ITA = Italian; SAR = Sardinian; SIC = Sicilian); REU = Rest of Europe (GBA = English); JEW = Jews (JBA = Baltic Jew; JCE = Central Europe Jew; JET = Ethiopian Jew; JIQ = Iraqi Jew; JIN = Iranian Jew; JIP = Spanish Jew; JWE = Western Europe Jew). In boldface and underlined individual complete sequenced.

The presence in the Mediterranean basin and in West sub-Saharan Africa of M1a and M1c lineages can be taken as proof that these areas received influences both from the West and East North African centers of M1 radiation. Quantitative confirmation of the above described patterns are provided by AMOVA and pairwise distances based on FST analyses using the groups and populations described in Material and Methods and taking into account haplotypic molecular differences. As usual the bulk of the variation, 90%, is within populations, 6% is due to differences among groups and 4% to differences among populations within groups. Pairwise differences between populations (Table 3) offer a more detailed view. There is homogeneity between populations within eastern Africa, small differences (p < 0.05) within western Africa and strong heterogeneity between these main areas (p < 0.001). On the contrary, Iberian Peninsula has significant differences with the rest of Europe. In turn, West Asia conforms an homogenous continuum with East Africa and Europe excepting Iberian Peninsula and the latter is not significantly different of western Africa. All these results can be explained as due to the differential radiation of M1a from East Africa and M1c from Northwest Africa, the Iberian Peninsula being mostly influenced by Northwest Africa and the rest of Europe and western Asia by East Africa.

M1 haplotypes in Jews

Several M1 haplotypes have been detected in Jewish communities albeit in low frequencies [36,37]. However, when compared with non-Jew populations they show significantly higher frequencies for the whole M1 haplogroup (p = 33.54***) and for M1a in particular (p = 24.90***). The only striking exception is that of the Moroccan Jews for which no M1 lineages have been detected at all [36]. Interestingly, all M1 lineages found in Jews, except two, belong to the eastern clade M1a (Fig. 2). Therefore, as for the bulk of the M1 Near East haplotypes, the most probable origin of these Jewish M1 lineages is the result of an eastern African expansion around 5000 years ago. Another peculiarity of M1 in Jewish communities is its reduced haplotypic diversity (Table 2) which has been already detected for other mtDNA lineages [36,38]. In addition, there is a strong M1 geographic differentiation among Jewish communities. For example, all European Ashkenazi Jews have only one M1a lineage characterized by a transition in the 16289 position that has not been detected in other Jew or non-Jew populations. Similarly, all West Asian Jews shared an identical M1a motif characterized by a transition in the 16209 position that has been detected only once in Ethiopia. These results are congruent with the proposition that, in the majority of the cases, Jewish migrations implied strong maternal founder effects [36-38]. Nevertheless, as M1a Jewish lineages are unique and different in different groups, we think that its source Near East population should not suffer strong genetic bottlenecks. Finally, it is worth mentioning that M1 frequencies of Jewish groups and their host populations are significantly correlated (r = 0.942**) which suggests that some genetic interchange must have happened between them as already proposed by others authors [36,37].

Radiation ages and evolution of lineages

Radiation ages for M1 and its subhaplogroups have been estimated on the basis of complete coding and HVSI sequences using different mutation rate estimations (Table 4). The ages obtained for M1 and M1a from HVSI data are more coherent with those calculated for the coding region using the Ingman et al. [6] mutation rate than that proposed by Mishmar et al. [8]. Our coalescence age estimations for the whole M1 clade (20,000–30,000 years) are younger than those previously published [22]; however, the approximate expansion ages for the eastern Africa M1a subclade (10,000–20,000 years) are in the same range. Although standard errors overlap, it seems that the northwestern Africa expansion represented by M1c subclade (19,040 ± 4916 years), preceded the M1a eastern Africa expansion (16,756 ± 5997) M1b being the youngest branch (10,155 ± 3590). It must be stated that coalescence ages are only rough estimations biased by mutation rate estimations, small sample size, demographic history and, possibly, selection. There are recent examples of clock-like evolution violations in several mtDNA lineages that have been explained by selective or demographic effects [39-41]. Here, subclade M1a2 (Fig. 1) represents a new example of constant mutation rate violation. The mean number of substitutions accumulated in M1a2 lineages (12.5 ± 0.7) is significantly higher (p = 0.008) than that in the rest of M1 lineages (8.4 ± 1.3). This result is not compatible with a uniform rate of evolution. The small standard errors show that there is high lineage homogeneity within groups, which weakens the possibility that stochastic processes have played a main role. Different patterns of synonymous and nonsynonymous changes among different lineages have been taken as hints of a role for selection in other studies [8,39]. In our case differences between synonymous vs. nonsynonymous changes within groups does not reach statistical signification (p = 0.75). However, the mean number of coding region substitutions accumulated in M1a2 lineages (11 ± 0.0) is significantly higher (p < 0.001) than in the rest of M1 (5.6 ± 0.7). Conversely, the mean number of regulatory region substitutions accumulated in M1a2 lineages (1.5 ± 0.7) is smaller than in the rest (2.8 ± 0.9) although not reaching statistical significance (p = 0.175). If the mutation rate was constant along the whole mtDNA molecule, for each mutation in the regulatory region roughly fourteen mutations should accumulate in the coding region. However, selection pressure is higher in the coding than in the regulatory region so that the substitution rate is ten times faster in the latter. The mean coding/regulatory ratio is 8.3 for M1a2 lineages and only 2.4 for the rest of M1. We interpret these results as due to different ages of expansion between clades. M1a2 would be the youngest clade with a more recent expansion than the others so that purifying selection has not had enough time to eliminate mutations with small deleterious effects in the coding region. We think that differences in the rate of evolution among subgroups of the North African U6 haplogroup [40] could be better explained by the same pattern assuming that the U6a subclade, with the highest coding/regulatory ratio, had a more recent radiation than the U6b subclade. In spite of its anomalous behavior, M1a2 has only a minor effect on the estimation of the whole M1 coalescence age although its omission significantly diminishes that of the M1a subgroup (Table 4).

Phylogeographic parallelism between M1 and U6 haplogroups

There are striking similarities between the geographical dispersals and radiation ages observed here for M1 lineages and those previously published for the North African U6 haplogroup [40]. It was proposed that U6a first spread was in Northwest Africa around 30,000 ya. Coalescence ages for M1 also fit into this period and the oldest clade M1c has an evident northwestern Africa distribution; however it had to have a wide geographic range as some M1c lineages are today still present in Jordanians (Figs.11 and and2). It is curious that this prehistoric Near Eastern colonization was also pointed out by the uniqueness of the U6a haplotypes detected in that area. A posterior East to West African expansion around 17,000 ya was indicated by the U6a1 relative diversity and distribution. Again, age, relative East to West diversities and geographic range accurately correspond with the M1a1 expansion detected here. More recent local spread of lineages U6b and U6c also parallel the M1b and M1c1 distributions. Furthermore, these similarities also hold outside Africa. U6 lineages in the Iberian Peninsula have been considered traces of northward expansions from Africa. Based on the uneven distribution of U6a and U6b lineages in Iberia, with the former predominating in southern and the latter in northern areas, it was proposed that U6b in Iberia represents a signal of a prehistoric North African immigration whereas the presence of U6a could be better attributed to the long lasting historic Arab/Berber occupation [40]. Again, this pattern is accurately repeated by the M1c and M1a distribution in the Iberian Peninsula, the northwest African M1 being more abundant in northern areas (56%) and the East African M1a in southern areas (85%) although, due to the small sample size, difference does not reach a significant level (p = 0.07). Additional support to the hypothesis of a prehistoric introduction are the recently detected presence of a Northwest African M1c lineage in a Basque cemetery dated to the 6th–7th centuries AD, prior to the Moorish occupation [42], and the ancestral phylogenetic position of another Basque M1d sequence (Fig. 1) that does not match any African sequence. Finally, two autochthonous U6 lineages (U6b1 and U6c1) traced the origin of the Canary Islands prehispanic aborigines to Northwest Africa [43]. Although exclusive M1 lineages have not been detected in the Canary Islands, it is worth mentioning that those sampled belong to the Northwest African area [44]. Outside Africa and the Iberian Peninsula, as with U6, M1 has been mainly detected in other Mediterranean areas with main incidences in islands such as Sicily. It is customary to attribute these incidences to the above mentioned Arab/Berber historic occupations. However, taking into account the major Jewish assignation for all the M1a haplotypes detected in Europe, the possibility of a Jewish maternal ascendance for at least some of these lineages should not be rejected.

Note that the two M1 lineages sampled in the Balearic isles were of Jewish adscription [45]. Also, there were well documented Jewish settlements in Sicily since early Roman times [46] and, coincidentally, half of the M1 lineages sampled in that island [47,48] belong to the M1a cluster. Finally, the Atlantic archipelagos of Canaries and Madeira, where the rigor of the Spanish Inquisition was stronger, only have M1c representatives. In contrast, in the Azores Islands, that were used as a refuge by Sephardim Jews expelled from the Iberian Peninsula, half of the M1 sequences detected are of M1a assignation [49,50]. These possible Jewish contributions might be also extended to the U6 lineages of eastern origin because all U6 haplotypes detected in Ashkenazim and other Jewish groups, excepting one that is a basal U6a (16172–16219–16278), belong to the eastern Africa clade U6a1 [36,26]. An additional proof of the striking parallelism between M1 and U6 lineages is the fact that, as for M1, no U6 representatives were sampled in Moroccan Jews in spite of the high frequency of this clade in the Moroccan and Berber host populations [36].

Most probable origin of M1 ancestors

Mitochondrial M lineages in Ethiopia were first detected by RFLP analyses [51]. To explain its presence in that area the authors suggested two possibilities: 1) the marker was acquired by Ethiopians through interchanges with Asians or 2) it was present in the ancient Ethiopian population and was carried to Asia by groups who migrated out of Africa. Later, the second hypothesis was favored and a single origin of haplogroup M in Africa was suggested, dating the split between Asian and African M branches older than 50,000 ya [22]. Although not completely discarding this last scenario other authors considered that the disjunctive was unsettled. The vast diversity of haplogroup M in Asia compared to Africa pointed to the possibility that M1 is a branch that traces a backflow from Asia to Africa [7,23]. Due to the scarcity of M lineages in the Near East and its richness in India, this region was proposed as the most probable origin of the M1 ancestor [7,52]. However, recent studies based on Indian mtDNA sequences [24,25] have not found any positive evidence that M1 originated in India. Nevertheless, the inclusion of M1 complete mtDNA lineages in the construction of the macrohaplogroup M phylogeny clearly established that the antiquity of Indian lineages, as M2, as compared to Ethiopian M1 lineages support an Asian origin of macrohaplogroup M [24]. Furthermore, the comparison within Africa of eastern and western M1 sequences left the origin of M1 in Africa uncertain [21]. On the light of our and other authors results, it seems clear that by their respective coalescence ages and diversities, M1 is younger than other Asiatic M lineages. Although it is out of doubt that the L3 ancestor of M had an African origin, macrohaplogroup M radiated outside Africa and M1 should be considered an evolved branch that signals its return to this continent. Even more, as the coalescence ages of the northwestern M1c clade is older than the eastern M1a clade, we think that the most ancient dispersals of M1 occurred in northwestern Africa, reaching also the Iberian Peninsula, instead of Ethiopia. The detection of an ancestral M1c sequence in Jordanians could be explained by two alternative hypotheses: 1) that the Near East was the most probable origin of the primitive M1 dispersals, West into Africa and East to Central Asia. This supposition would explain the presence of basic M1 lineages, instead of the most common M1a derivates, as far as the Tibet. The actual scarcity of these types in eastern areas could be explained by posterior migrations that erased these primitive lineages. The absence of these ancestral M1c lineages in Ethiopia would point to the Sinai Peninsula as the most probable gate of entrance of this backflow to Africa. 2) That M1 is an autochthonous North African clade that had its earliest spread in northwestern areas marginally reaching the Near East and beyond. This would explain the shortage of basic M1 lineages in the Near East but would leave the Asiatic origin of the M1 ancestor undetermined. In any case, both alternatives envisaged M in Africa as an offshoot of the Asiatic M trunk. The striking phylogeographic parallelism between U6 and M1 haplogroups adds additional support to these hypotheses. It is possible to correlate the dispersion ages of the different M1 clades with their contemporary climatic, archaeological, paleoanthropological and linguistic information. For instance, the first M1 backflow to Africa, dated around 30,000 ya, is coincidental with a harsh glacial period which suggests that this human retreat to Africa could be forced by climatic conditions. The low sea level in the Gibraltar Strait at that time could also facilitate the Iberian Peninsula colonization. The northwestern African M1c and the probable north central M1b expansions are coincidental with the Iberomaurusian and Capsian industries. The anomalous evolution of M1a2 lineages left the coalescence ages of the eastern Africa M1a expansion uncertain, but as suggested for the sister U6a1 radiation; these movements could be correlated in time with an African origin and expansion of Afroasiatic languages [40]. Finally, from a maternal genetic perspective it seems that Neolithic occupation of the Sahara had both eastern and western influences. Most probably other mtDNA lineages participated in this human back flow to Africa. It has been suggested that the North African X1 branch of the Euroasiatic haplogroup X could be one of them [63].

Whilst this paper was under review, a new paper also dealing with U6 and M1 haplogroups was published [53]. Haplogroup topologies and phylogeographic conclusions proposed by Olivieri et al. [53] are highly coincidental with those proposed by us in our previous paper on U6 [40] and in the present paper, dealing with M1. Regrettably, there are differences in nomenclature for M1. Whereas our M1 phylogeny adhered to that proposed previously by other authors [21], Olivieri et al. [53] chose to apply their own. Nevertheless, the diagnostic positions for the different M1 subhaplogroups allowed us to establish subhaplogroup homologies between the two works. Clearly their M1b subgroup (defined by transition 13111) corresponds to our M1c subgroup; their M1a2 subgroup (defined by transition 15884) corresponds to our M1b subgroup. Finally, their M1a1 subgroup (defined by transitions at 3705, 12346 and 16359) corresponds to our M1a subgroup. In addition to the reinforcing overlap of ideas, it is worthwhile mentioning the high coincidence for the coalescence ages of M1 and the majority of its subhaplogroups, when the same substitution rate [8] is used. Olivieri et al. [53] calculated a coalescence time estimate of 36.8 ± 7.1 ky for the entire haplogroup M1 that matches our estimate of 35.2 ± 7.1 ky. Our coalescence time for M1c (25.7 ± 6.6 ky) also overlaps with Olivieri et al. [53] haplogroup M1b (23.4 ± 5.6 ky). Likewise, the coalescence age calculated for our M1a subhaplogroup (22.6 ± 8.1 ky) is in the range of the Olivieri et al. [53] estimation for their M1a1 subhaplogroup (20.6 ± 3.4 ky). The only discrepancy is about the coalescence time estimate between our M1b subhaplogroup (13.7 ± 4.8 ky) that is younger than that calculated by Olivieri et al. [53] for their homologous M1a2 (24.0 ± 5.7 ky). As our calculations are based only on three lineages and that of Oliveri et al [53] on six, we think that their coalescence time estimation should be more accurate that ours. In fact, when time estimation is based on the eight different lineages (AFR-KI43 is common to both sets) a coalescence age of 20.6 ± 5.0 ky is obtained. Although with overlapping errors, these results, together with the relative ancestral positions of each subgroup in the phylogenetic tree (Fig. (Fig.1),1), would suggest that the northwestern M1c clade radiation was older than those for the ubiquitous M1b and the eastern M1a clades, as also proposed by Olivieri et al. [53].

González, Ana M., et al. "Mitochondrial lineage M1 traces an early human backflow to Africa." BMC genomics 8.1 (2007): 223.

Admin
Administrator

Posts: 81,448

Genomic Data Reveal a Complex Making of Humans Jan 1, 2015 14:46:51 GMT

Quote

Post by Admin on Jan 1, 2015 14:46:51 GMT

Study of the compiled dataset
The pooled percentage distribution of Y-haplogroups in the overall dataset of 2809 Y-chromosomes (767 Brahmins, 674 schedule castes and 1368 tribals) is summarized in Supplementary Table 2. All together (Brahmins, schedule castes and tribals), 22 Y-haplogroups were observed. The percentages of seven of these haplogroups (with percentage >5%) accounted for 85.5% of the total number of Y-chromosomes (n=2809). The haplogroups with their percentages in descending order were: R1a1* (21.1%), H1 (19.1%), R2 (10.5%), O (10.1%), L (9.5%), J*/J2 (8.3%) and F* (6.9%). These haplogroups remained the most frequent haplogroups even after the distribution of Y-chromosomes within respective groups of Brahmins, schedule castes and tribals, but with significant percentage differences (Supplementary Table 2). Five haplogroups out of 18 were found to be most frequent (>5%) in Brahmins (R1a1* (35.7%), J*/J2 (12.4%), L (11.3%), R2 (10.8%) and H1 (8.0%)) and represented 78.2% of the total number of samples (n =767), whereas haplogroup O was found to be very less frequent (0.7%) in Brahmin Y-chromosomes. Seven out of 14 haplogroups (with percentage >5%) (H1 (24.2%), R1a1* (17.2%), R2 (14.2%), L (12.2%), F* (9.8%), J*/J2 (6.4%) and K* (5.3%)) represented 89.3% of the total number of Dalit Y-chromosomes (n =674). Tribal Y-chromosomes represented by seven out of 20 haplogroups displayed percentages >5%: O (25.5%), H1 (25.3%), R1a1* (10.2%), F* (7.5%), R2 (6.4%), J*/J2 (6.1%) and L (5%) (86% of the total number of samples (n=1368)). All other observed haplogroups had their percentages <5% (Supplementary Table 2). The study was further extended, dividing the samples into four main linguistic categories (Indo-European (IE), Dravidian (DR), Tibeto-Burman (TB) and Austro-Asiatic (AA)) present in India as well as five regional categories (Central, East, North, South and West India). Y-haplogroup distributions as per these categories are presented in Supplementary Figures 3a and b. AMOVA was also done using the compiled dataset and by characterizing the populations into social, geographical and linguistic groups (Table 2). Geographical regions showed very less variation (0.79%) among the groups but higher variation between populations within groups (16.94%). In contrast, linguistic groups showed higher variation among the groups (15.56%) but lower variation between populations within linguistic groups (6.15%). Interestingly, when the TB linguistic group was removed from the analysis, the percentage variation among the groups reduced (9.43%) but variation between populations remained almost the same (Table 2). It was observed that by either of the grouping most of the variation was within the population groups.

Figure 1. The spatial distribution maps of Y-haplogroup R1a1 generated by the Kriging procedure using SURFER version 8.0. (a) Spatial frequency distribution of Y-haplogroup R1a1* across Eurasia, Central Asia and the Indian subcontinent. (b) Spatial distribution of Y-haplogroup R1a1*-associated diversity based on microsatellite markers.

Origin of Y-haplogroup R1a1*
However, a peculiar trend in distribution of the highest frequency of Y-haplogroup R1a1* (Table 1) in Brahmins, H1 in tribals and schedule castes, and O in tribals was also observed. Whereas on the one hand a consensus has developed in the literature among all schools of thought in assigning Indian origin to haplogroup H1 and in the association of haplogroup O with either Austro-Asiatic or Tibeto-Burman tribals, the widespread geographic distribution of R1a1* and reasonably high frequency across Eurasia (Figure 1a), with scanty representation of its ancestral (R*, R1* and R1a*) and derived lineages (R1a1a, R1a1b and R1a1c) across the region, leaves obscure the question of origin of R1a1*. This becomes more complex with the claims7, 9, 12, 23 proposing a scenario of the recent major gene flow from Central Asia to India and the antagonistic observations9, 12 of its highest variance in India, suggesting the gene flow in opposite direction. Further, the observation of a very high frequency (upto 72.22%) in this study (Table 1) and in the literature (Supplementary Figures 3a and b) of this haplogroup in all of the Brahmins may indicate its presence as a founder lineage for this caste group (irrespective of the geographical and linguistic affiliation of Brahmins), thus making this haplogroup of extreme importance and a key haplogroup in answering the question of origin of caste systems in India.

Figure 2. Admixture proportions were estimated using ADMIX2 software under different models. All populations (Europeans (EU), Central Asians (CA) and Indian Brahmins (IB)) were considered alternatively as source populations and the respective proportions of contributions were estimated. mY1 and mY2 are the estimated admixture coefficients, corresponding to the relative contribution to the hybrid population (Hyb) from the parental populations (P1 and P2, respectively).

Admixture and diversity analysis
Considering the very high frequency of R1a1* (upto 72.22% as in WB) in Brahmins, irrespective of their geographical and linguistic affiliations, admixture analysis41 based on pooled data was performed. Three models of potential parental contributions of R1a1* (Figure 2) were tested, to evaluate the concepts of Central Asian introduction of the Indian caste system7 by Indo-Aryans (appointing themselves to the castes of higher ranks),14 as well as of rank-related West Eurasian admixture.11, 21 The observed proportions of contributions, taking all populations (Europeans (EU), Central Asians (CA) and Indian Brahmins (IB)) alternatively as source populations under different models (Figure 2), suggested model 3 (CA+IB → EU) as the best fit model (tested by 1000 bootstraps) and model 2 also as a possibility, for contributions of R1a1*, based on both proportion of frequency distribution as well as molecular divergence. Admixture analysis in light of other genetic evidences from this study did not seem to favor either Central Asian origin of the haplogroup or rank-related Eurasian admixture; instead it supported the Indian origin of this haplogroup and its contributions to other regions.

Figure 3. Median joining network based on Y-STR haplotypes within Y-haplogroup R1a1*, showing the relationship between Indian, Central Asian and Eurasian population groups. *Biallelic marker M17 was included with the highest weight. The root of the network represents an individual with SRY10831b-R1a* (x M17-R1a1).

Molecular evidences for the origin of R1a1* in the Indian subcontinent
The median joining network40 was also constructed. This algorithm provides the best results when applied on datasets of multi-state markers but within closely related haplotypes54 as is the case, using pooled data of R1a1* haplogroup. The inferences from the analysis (Figure 3) were again in favor of our earlier observations. The Indian haplotypes were observed to be the most diverse, and haplotypes spanning Central Asia and Eurasia, along with some Indian regional haplotypes, seemed to be derived as a subset of this diversity. The extremely high level of sharing of haplotypes across the regions as well as reticulations, mostly with one step difference, in this subset suggests parallel evolution of different haplotypes, which appears more plausible after their geographical distribution and expansions. However, the diversity within the Indian populations, represented by the long branches and links connecting many haplotypes, is also an indicator of their ancestry, geographical differentiation and severe bottlenecks within India, suggesting loss of many of the intermediate haplotypes, thus reducing the reticulation and increasing the branches’ length. The observed genetic distances FST38 and 1−PSA44 within the R1a1* haplogroup, between Central Asians (CA), Europeans (EU), as well as pooled populations of the Indian subcontinent (IS) showed overlapping trends of distribution. FST is based on the total variance in allele frequencies among populations and 1−PSA considers shared allele frequencies. IS populations showed less sharing with the CA (FST=0.095, 1−PSA=0.61) as compared with the EU (FST=0.021, 1−PSA=0.73) populations. AMOVA for these three pooled population groups (EU, CA, IS) showed that 94.07% of the total variation is present within the population, whereas only 5.93% of the differences are observed among population groups.

Figure 4. Median joining network based on Y-STR haplotypes showing the relationship between Kashmiri and Saharia Y-chromosomes bearing Y-haplogroups R1a* and R1a1*. Biallelic markers M17 and SRY10831b were also included and given the highest weight. The root of the network represents an individual with M173-R1* (x SRY10831b-R1a).

Age estimates for Y-haplogroup R1a1*
The age of microsatellite variations was re-calculated using Y-STRs data and by applying mutation rates and generation times (discussed in Materials and methods) within R1a1* lineage in Central Asia, Eurasia, Pakistan, as well as Indian populations (Table 3), and compared with the already published ages. The ages of the haplogroup, within the various population groups of India as well as after distributing them to social groups, were also calculated (Table 3). It was observed that the age of R1a1* was the highest in the Indian subcontinent. Interestingly, among different groups, the age of Y-haplogroup R1a1* was highest in scheduled castes/tribes when compared with Central Asians and Eurasians. These observations weaken the hypothesis of introduction of this haplogroup and the origin of Indian higher most castes from Central Asian and Eurasian regions, supporting their origin within the Indian subcontinent. Further, a particular population group of northern India, the Kashmiri Pandits (KPs), showed the highest variance (0.52) and thus the respective age (Table 3). Another north Indian population group, Himachal Brahmins, also showed higher variance (0.43) than that of the average Indian population.

High frequency of Y-haplogroup R1a1* in tribal populations and ancestral Y-haplogroup R1a* in the Indian subcontinent
Y-haplogroup R1a1* has been reported to be present in the tribal population in many of the earlier studies, but with very less frequency. In this study, a tribe named Saharia from Madhya Pradesh (Central India) showed the presence of R1a1* with high diversity in 19/71 males (26.76%), negating the idea of later admixture or some founder effect. Similar observations were made in the Chenchu tribe of Andhra Pradesh,24 with a high percentage (26.82%) of R1a1*.

Apart from the observation of a simultaneous presence of R1a*, the ancestral haplogroup of R1a1* was also observed in this study with a highest ever known frequency in the two population groups KPs and Saharia. Incidentally, KPs are Brahmins, whereas Saharia is a tribal population group. Scanty representation of the R1a* haplogroup and its ancestral lineages (R*, R1*) in any of the geographical regions and the presence of the R1a1* haplogroup at high frequency across Central Asia and Eurasia had kept alive the question of the origin of R1a1* and associated conflicts. With the high-resolution analyses of the haplogroup (R1) in some population groups that were absent in the earlier studies and with the addition of published datasets, we were able to provide a clearer picture of the origin of R1a1* haplogroup and solve the existing conflict in literature. The calculated age for the haplogroup R1a* in both the population groups showed fascinating results. It was observed that the variance (0.43) of R1a*, and hence the respective age of this ancestral haplogroup, was far less in Kashmiris than the observed variance (0.52) and age of the derived R1a1*. However, a variance of 0.6 was observed in the Saharia tribe for R1a*, providing the age of 21 739.13 with 95% CI 15 789.47–34 883.72 years to this haplogroup. The haplogroup R1a1* was found to have an age of 13 043.48 and 95% CI 9473.68–20 930.23 years. To resolve the contradiction in these observations, we tried to explore the whole of the R1a lineage in these two population groups. By providing higher weight to the SNP (M17 that defines R1a1*) in the median joining network of Y-STR haplotypes within the R1a lineage among KPs and Saharia (Figure 4), we were able to elucidate some important inferences based on the clustering of the haplotypes. Two main clusters differentiating R1a* and R1a1* haplogroups were observed at the first instance. Further, subclustering based on population groups could be seen within these major clusters. However, few individuals belonging to KPs were seen in the Saharia population group clusters and vice versa, representing both R1a* and R1a1* haplogroups. It was particularly interesting to observe close overlaps in R1a1* cluster. Further, the long branches and less networking in both of the clusters (R1a* and R1a1*) again indicated bottlenecks and expansions, eliminating many of the haplotypes and resulting in long branches in the median joining tree. The exclusive high presence of the ancestral R1a* lineage in KPs and Saharias, their level of sharing, observed by way of a PSA of 0.51 (based on the average of Y-STRs within R1a*) and clustering in the network, suggested their deep common ancestry, a probable source population for the origin of R1a1* and for Brahmins, which later on differed in the two population groups. This observation of a close relationship was reflected in the MDS plot based on FST values obtained from a haplotypic analysis of 6Y-STRs within R1a1* (Supplementary Figure 5). Some of the other evidences hinting at this closeness are reflected in the cultural practices as well as folklores of these population groups.

Conclusions
The observation of R1a* in high frequency for the first time in the literature, as well as analyses using different phylogenetic methods, resolved the controversy of the origin of R1a1*, supporting its origin in the Indian subcontinent. Simultaneously, the presence of R1a1* in very high frequency in Brahmins, irrespective of linguistic and geographic affiliations, suggested it as the founder haplogroup for the population. The co-presence of this haplogroup in many of the tribal populations of India, its existence in high frequency in Saharia (present study) and Chenchu tribes, the high frequency of R1a* in Kashmiri Pandits (KPs—Brahmins) as well as Saharia (tribe) and associated phylogenetic ages supported the autochthonous origin and tribal links of Indian Brahmins, confronting the concepts of recent Central Asian introduction and rank-related Eurasian contribution of the Indian caste system.

However, there is a scanty representation of Y-haplogroup R1a1 subgroups in the literature as well as in this study. The known subgroups (R1a1a, R1a1b and R1a1c), which are defined by binary markers M56, M157 or M87, respectively (Supplementary Figure 1), were not observed. In such a situation, it is likely that this haplogroup (R1a1*) is a polyphyletic (or paraphyletic) group of Y-lineages. It is, therefore, very important to discover novel Y chromosomal binary marker(s) for defining monophyletic subhaplogroup(s) belonging to Y-R1a1* with a higher resolution to confirm the present conclusion. Further, the under-representation of phylogenetic data of the population groups of North India in the literature and our observations hint at the immense need of phylogenetic explorations in the northern most Himalayan regions of India, which might have acted as an incubator of many ancient lineages, to obtain a clearer picture of the peopling of India and Eurasia.

Sharma, Swarkar, et al. "The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system." Journal of human genetics 54.1 (2009): 47-55.