Post by Admin on Dec 1, 2021 2:14:29 GMT
Clades and closest neighbour analysis
The phylogenetic tree was used to define and calculate the age
of the clades of interest, and for the closest
neighbour analysis. A clade was defined when all the
sequences in a subtree came from individuals from the
same (super) population (Fig. 2). In our case, the minimum
number is two to form a clade. Super-populations
were defined as three different categories: North Indian
[Indo-European Speakers] non-tribal, South Indian [Dravidian
Speakers] non-tribal and Indian in general. For
the Indian super-population, we included both North
and South Indian non-tribal individuals with Dravidian
(Irula) and Austro-Asiatic (Birhor) individuals, but not
the Andamanese and Tibeto-Burman individuals as they
have a different ancestry from other Indian populations
(Mondal et al. 2016). We searched for all the biggest
clades in the phylogenetic tree for each super-population,
regardless of the haplogroup classifcation. The algorithm
stops the search when one individual does not belong to
that super-population. Then we calculated the Time to
the Most Recent Common Ancestor (TMRCA) of such
clades (Fig. 2), which were used to calculate the divergence
time of internal clusters in Fig. 4a.
The closest neighbours were those sequences which
were closest to a specifc clade of a super-population (just
outside of the clade) containing at least one Y-chromosome from
a different super-population. In this case, we
also identifed the specifc population to which the closest
neighbour belongs, and the time depth of the joint cluster
(TMRCA of the clade and the closest neighbour together).
Depending on the tree structure, the closest neighbour of
a single clade can consist of a single or multiple individuals
(Fig. 2). The divergence times of such neighbours were
calculated from the average TMRCA of the joint cluster
to every individual of that cluster (essentially the average
height of the joint cluster). The analysis of closest neighbours
provides information about the time and location
of the most recent migrations between the target populations and
other populations represented in the tree. In
Fig. 4a, the blue distribution shows the divergence time of
all such neighbours from specifc super-population clades.
Figure 4b shows the time depth of the closest neighbour
for each sequence, separated by population of origin (horizontal
axis), and differentiating the three super-populations
where the closest neighbour is found (North India, South
India and India). In Fig. 4c, we only concentrated on
Europeans, who are the closest neighbours of the Indian
superpopulation (essentially a subset of Fig. 4b).
All the phylogenetic and clade analyses were done with
the “ape” R package (Paradis et al. 2004). As clade-specifc
analysis can be biased because of sampling effects, we also
looked for the closest European for every Indian individual
regardless of their clade or haplogroup, with similar results
(not shown).
The phylogenetic tree was used to define and calculate the age
of the clades of interest, and for the closest
neighbour analysis. A clade was defined when all the
sequences in a subtree came from individuals from the
same (super) population (Fig. 2). In our case, the minimum
number is two to form a clade. Super-populations
were defined as three different categories: North Indian
[Indo-European Speakers] non-tribal, South Indian [Dravidian
Speakers] non-tribal and Indian in general. For
the Indian super-population, we included both North
and South Indian non-tribal individuals with Dravidian
(Irula) and Austro-Asiatic (Birhor) individuals, but not
the Andamanese and Tibeto-Burman individuals as they
have a different ancestry from other Indian populations
(Mondal et al. 2016). We searched for all the biggest
clades in the phylogenetic tree for each super-population,
regardless of the haplogroup classifcation. The algorithm
stops the search when one individual does not belong to
that super-population. Then we calculated the Time to
the Most Recent Common Ancestor (TMRCA) of such
clades (Fig. 2), which were used to calculate the divergence
time of internal clusters in Fig. 4a.
The closest neighbours were those sequences which
were closest to a specifc clade of a super-population (just
outside of the clade) containing at least one Y-chromosome from
a different super-population. In this case, we
also identifed the specifc population to which the closest
neighbour belongs, and the time depth of the joint cluster
(TMRCA of the clade and the closest neighbour together).
Depending on the tree structure, the closest neighbour of
a single clade can consist of a single or multiple individuals
(Fig. 2). The divergence times of such neighbours were
calculated from the average TMRCA of the joint cluster
to every individual of that cluster (essentially the average
height of the joint cluster). The analysis of closest neighbours
provides information about the time and location
of the most recent migrations between the target populations and
other populations represented in the tree. In
Fig. 4a, the blue distribution shows the divergence time of
all such neighbours from specifc super-population clades.
Figure 4b shows the time depth of the closest neighbour
for each sequence, separated by population of origin (horizontal
axis), and differentiating the three super-populations
where the closest neighbour is found (North India, South
India and India). In Fig. 4c, we only concentrated on
Europeans, who are the closest neighbours of the Indian
superpopulation (essentially a subset of Fig. 4b).
All the phylogenetic and clade analyses were done with
the “ape” R package (Paradis et al. 2004). As clade-specifc
analysis can be biased because of sampling effects, we also
looked for the closest European for every Indian individual
regardless of their clade or haplogroup, with similar results
(not shown).