Total Pageviews

Monday, May 1, 2017

Lactase persistence gene: MCM6 of Denisovans found in East Asia

Today, I obtained genotype data comprising MCM6 SNPs from HapMap for 2436 individuals from 26 HapMap populations (ESN, GWD, LWK, MSL, YRI, ACB, ASW, CLM, MXL, PEL, PUR, CDX, CHB, CHS, FIN, GBR, IBS, TSI, CEU, BEB, ITU, JPT, KHV, PJL, STU, GIH). Haplotype annotation of each individual can be downloaded here. I focused on all the SNPs that show more variation (more than 1% of the alternative base). I excluded all SNPs that have varying numbers of bases (insertion/deletion), and I excluded rs4988274 and rs55809728 because they did not show a clear pattern (hypervariability?). I excluded all haplotypes that were only present in 2 or less individuals. Using the remaining 203 SNPs I was able to identify 76 haplogroups. Please note that there are many more haplotypes (especially in Africa); I just focused on most common ones as stated above. Please also note that I changed the nomenclature at this stage because the following three reasons:
1. Some of the populations identified in Enattah et al., 2008 are not present in the HapMap dataset. Thus, some of the identified haplotypes in Enattah et al., 2008 are missing here.
2. Because this time I used more SNPs most of the haplotypes split into several haplotypes:
    ht1=> ht17-30 (blue)
    ht2=> ht31 (turquoise)
    ht3=> ht32 (turquoise)
    ht4=> ht59-73 (orange)
    ht5=> ht74-76 (red)
    ht6=> ht49-57 (green)
    ht13=> ht46-48 (brown)
    ht16=> ht45 (cayenne)

Additionally, I observed some other groups:

ht6-ht116 (violett): mostly present in Africa
ht35-ht38 (salmon): worldwide in low frequencies
ht40-ht44 (yellow): mostly present in Africa

3. I wanted to sort the haplotypes based on the phylogenetic tree.

I also checked the data of Denisova and Neanderthals for these 203 SNPs, and predicted the missing SNPs.

Find below the phylogenetic tree of the 76 MCM6 haplotypes including Denisova and Neanderthals.

SNPs of 76 haplotypes including the Denisova and Neanderthal SNPs (predicted ones in grey) and frequencies of all 76 haplogroups can be downloaded here (with summary at bottom).


Distribution of the main branches of MCM6:





Every branch of the phylogenetic tree of MCM6 tells its migration story. Based on the generated data I will try to postulate the steps of these migrations.


Grey branch (Denisova/Neanderthals/previous and current ht5):
ht5 is derived from the Denisova genome. The frequency of the ht5 suggests that the admixture Denisova and our ancestors occurred in Asia, which is in accordance with the current state of scientific knowledge.


Green branch (ht49-57; previously ht6):



Africa: ht50 is the ancestor of this green branch, and is only found in individuals with African ancestry (7-16%). ht50 was not part of OOA (Out of Africa). The alternative hypothesis would be that ht50 derived from ht52.
ht51 is derived from ht52 and is only present in Africans and Afro-Americans (2% in Gambians, 0.8% African Ancestry in Southwest US).
OOA: ht52 is derived from ht50 and can be found in Africa (1-4%) and all parts of Eurasia. It is especially common in East Asia (12-21%). ht52 was part of OOA. ht52 split into two groups, one that kept staying in Eastern Eurasia and one that moved to Western Eurasia.
Eastern Eurasia: ht53 is derived from ht52 and can not be found in Africa (no back migration to Africa), but it can be found in Eastern Eurasia, especially South Asians (2-8%) and East Asians (4-5%), and made it into native Americans (11% in Peruvians). Given its very low frequencies in Southern Europe and its lack in Africa I assume that it very rare in the Middle East.
Western Eurasia: ht56 is also derived from ht52 and can be found in Western Eurasia. ht56 is the most successful ht among the green branch (ht6) reaching 2-5% in Africa, 2-15% in Americans (probably through European ancestry because lowest level found in Peruvians and highest in Colombians and Puerto Ricans), 1-18% in South Asia, and 10-30% in Europe. It is basically not present in East Asia (0-1%).
ht54, ht55, and ht57 are derived from ht56 and show a similar world wide distribution with ht57 being the most successful. They show no migration to Africa.
ht54 can be found in Europe (0-1%) and South Asia (0-1%).
ht55 can be found in Europe (1-2%), South Asia (0-4%), East Asia (0-1%), and Americans (0-1%; probably through European ancestry because found in Colombians and Puerto Ricans but not found in Peruvians)
ht57 can be found in Europe (2-5%), South Asia (0-2%), and Americans (1-5%; probably through European ancestry because lowest level found in Peruvians and highest in Colombians and Puerto Ricans).
ht49 emerged after a crossing-over event between ht56 and ht18, ht20-ht32 (most likely ht29). Thus, its position in the phylogenetic tree is misleading.
For Kurds, I expect ht56 and ht52 to be the most common ones among this branch. If there is any unknown LP persistance genotype among Kurds then it is probably a subbranch of ht52 or ht56. Unfortunately, 23andme do not help to distinguish between the hts of the green branch (ht6).

Blue branch (ht17-30; previously ht1):

ht21 is the ancestor of this blue branch. ht21 can only be found in Africans (0-3%) and Afro-Americans (0-1%).
ht22 is derived from ht21. ht22 can only be found in Africans (0-3%) and Afro-Americans (0-1%).
ht23 is derived from ht22. ht23 can only be found in Africans (2-5%), Afro-Americans (2-4%), and Americans (0-1%), probably through African ancestry.
OOA: ht29 is derived from ht21. ht29 can be found in all parts of the world: in Africans (2-6%), in native Americans (3-10%; not through European ancestry because highest levels found in Peruvians and Mexicans), in East Asians (24-42%), in South Asians (4-15%), and in Europeans (1-4%).
Five subbranches are derived from ht29: a) ht24, b) ht25, c) ht26, d) ht28, and e) ht30.
a) ht24 is present in individuals from Southern Europe (0-1%) and Afro-Americans (0-1%).
b) ht25 is only present in South Asia (0-1%).
c) ht26 is present in Europe (0-1%) und South Asia (0-1%).
d) ht28 is only present in East Asia (3-12%).
e) ht30 is present in  native Americans (6-12%; probably not through European ancestry because the highest levels are found in Peruvians and Mexicans), in Europeans (1-13%; North-South gradient=British 1% and Toscana 13%), and in South Asians (3-9%).
ht18 emerged after a crossing-over event between ht29 and ht49-53, ht56, or ht57. Similar to ht49 in the green branch (ht6), its position in the phylogenetic tree is misleading.

Turquoise branch (ht31-ht32; previously ht2-ht3):
ht31 (ht2) is derived from ht30. ht31 is very rare but plays a key role for the European lactase persistence (ht3). As mentioned above, I showed that it is present in Kurds (2.5%), Iraqi Jews (4.7%), Pakistanis (2-6%), Kalash (14%), and Arabs from the Middle East (2,5%). Now, I could also confirm its presence in Americans (0-2%; through Lebanese ancestry?), Toscana (1%), and South Asians (1%).
ht32 (ht3) is derived from ht31, and is the most frequent MCM6 genotype reaching 71% in Northern Europe and only 8% in Toscana. ht32 is responsible for European lactase persistence but it can be found elsewhere, too: frequencies of 11-31% in Americans (through European ancestry because the lowest levels are found in Peruvians), and 5-25% in South Asians (Northwest/SouthEast gradient with Punjabis being the highest and Tamils being the lowest). As mentioned above, I showed that it is present in Kurds (5%), Iranians (5%), Ob-Ugrics (5%), Arabs from Iraq, Syria, Lebanon and Palestine (13%), Morocco (17%), Saharawi (23%), and Fulani Sudanese (33%).

Orange branch (ht59-73; previously ht4):

ht61 is the ancestor of the orange branch. ht61 can only be found in South Asia.
ht62 is derived from ht61. ht62 can be found in Africa (1-3%), in Europe (0-2%), and in South Asia (0-3%). In South Asia there is a North/South gradient: in Europe the distribution is less clear.
ht63 is derived from ht62 and is solely African (0-1%).
ht64 is derived from ht63 and is also solely African (1-6%).
ht69 is derived from ht62 and is basically African, too (1-4%). Three subbranches emerged from ht69: a) ht70-71, b) ht65-68, and c) ht74-76 (red branch ht5).
a) ht70 is rare in Africa (0-1%), high among native Americans (5-22%) and South Asians (8-16%). 1-5% of Europeans (North/South gradient) and 8-13% of East Asians have ht70.
ht71 is derived from ht70, and is only present in South Asia (3-5%).
b) ht65-68 are all solely African (together 6-12%).
c) ht74-76 is the red branch (ht5; see below)
ht60 emerged after a crossing-over event between ht18, ht28-30, or ht31 and ht62, hg36-40 or ht69. Similar to ht49 in the green branch (ht6) and ht18 in the blue branch (ht1), its position in the phylogenetic tree is misleading.

Red branch (ht74-76; previously ht5):
ht75 is the ancestor of the red branch and can be found in all parts of the world: 1-4% in Africans, 6-10% in  Americans, 7-10% East Asians, 3-14% in Europeans (North/South gradient), and 7-9% in South Asians. Two subbranches emerged from ht75: a) ht76 and b) ht74 -40
a) ht76 is derived from ht75 and is present especially in Native Americans 3-13% (not through European ancestry because the highest levels are found in Peruvians and Mexicans), it is also present in Europe 0-2%, and South Asians 0-3% but not in Africans and East Asians.
b) ht74 and hg40 are derived from ht75 and are basically only present in South Asians (1-6%) with a clear North/South gradient.  ht72 was a little bit more successful and made it to South Asians (1-3%), East Asians (0-3%), Europeans (1-3%), and especially in Native Americans 1-7% (not through European ancestry because the highest levels are found in Peruvians and Mexicans).

Brown branch (ht46-48; previously ht13):
hg28 is the ancestor of the brown branch and is solely African (0-2%) as well as all of its derived subbranches ht47, hg30, and ht48.
hg27 emerged after a crossing-over event between hg28 and hg20-21 or hg23-25. Similar to ht49 in the green branch (ht6), ht18 in the blue branch (ht1), and ht60 in the orange branch (ht4), its position in the phylogenetic tree is misleading.

The remaining branches (hg1-25) are mostly African. Exceptions are hg5 (see grey branch), hg18-19 (0-6% in East Asians), and hg13 (1-2% in South Asians, 0-1% in Europeans and East Asians).

Yellow branch (ht40-44):

ht43 is the ancestor of the yellow branch. The yellow branch is solely African.

Update May 7th, 2017:
Kurd from anthrogenica.com helped me out to collect some more data. Thanks!

Skythian Sarmatian: blue branch ht17-31 excluding ht24, however, rs4988186=T (usually in orange/yellow branch ht33-44)
Skythian Pazyr: blue branch ht24
Skythian Aldy: ht30-31 (blue/turquoise branch)
Skythian Volga: blue branch ht18, ht20-ht31, ht34

No comments:

Post a Comment