WHY YSNPs ARE IMPORTANT
As most genealogists, I originally thought that YSNPs played a very minor role for genealogists. Ten years ago, this may have been the case. The haplogroups (YSNPs) that were first discovered were several thousand years old and only divided all of mankind into around 100 deep ancestral branches. But even these ancient haplogroups created the first most widely known usage of using haplogroups as a quick methodology to separate genealogical clusters from each other. After all, if you do not share an ancestor at 4,000 or 5,000 years ago, you obviously will not share a common genealogical ancestor at 300 to 900 year range. But most sponsors of YSTR submissions probably believed that surname admins were just too lazy to manually separate submissions into valid genealogically related surname clusters and most sponsors were not very supportive in ordering "deep clade" tests (the first pack/panel tests) or individual SNP tests for the older YSNP branches. This remains an important usage of haplogroups even today as some surname projects have dozens of genetic surname clusters which are difficult to properly separate with YSTRs only. For around 10 % of the testers, their YSTRs either never "diverged" from very old YSTR values or they mutated back into older YSTR values by "converging" back to their original values. For these submissions, even older branches of mankind were used to rule out false matches.
However, many surname admins (and some sponsors) have learned over time that many genealogical clusters appeared to be genetically isolated from others while other groupings of submissions just seemed too genetically diverse to be true genealogical clusters. Some groupings appeared to be multiple genealogical clusters that overlapped in some fashion. Many surname admins are not really aware of the root cause of this overlap but many have learned that this is a result of having common YSTR marker values that have changed little over the last several thousand years. These common marker values may have changed over this time frame but due to a lot of parallel mutations and backwards mutations, many submissions arrived back to the same set of marker values they started out with several thousand years ago. Closely matching YSTR values are sometimes not related and are false matches by simplistic methodologies that only use genetic differences for measurement of relatedness. Only around ten percent of submissions fall into this category - but if it is your genealogical cluster, then it is 90 to 100 % of your cluster. So, the second major usage of YSNPs was to rule out false hits due to having common YSTR marker values.
With the growing understanding of common YSTR marker values and the number of useful YSNP haplogroups passing 500 haplogroups, surname admins began encouraging testers to order Nat Geo 2.0 tests or CROMO2 tests that tested thousands of known YSNPs that were largely not tested in depth (these replaced the deep clade tests). Also, since the discovery of new branches was at a very slow pace (due to very low coverage "Walk the Y" YDNA chromosone tests), ordering YSNP tests individually was very common in order to determine how and where these newly SNPs belonged in the constantly growing haplotree of mankind. These tests are still used to separate those pesky overlapping clusters where there were common YSTR marker values as well. Due to these two important factors (sorting tool to separate clusters and clarifying where new branches belonged on the haplotree), most FTDNA admins routinely requested testing of individual YSNPs (and still do even today).
Having common marker values can result in false matches for relatedness but having very rare marker values can be used distinguish your genalogical cluster from others. The discovery of more rare YSTR marker values lead to another very useful application of YSNP testing testing for genealogical usage - systematically determining how rare haplotypes are when compared to others. For genealogical clusters under the very deep ancestry R1b haplogroup, you can compare the MRCA of the R1b haplotype to the MRCA of the haplotype of your genealogical cluster. If you only find a few mutations, your genealogical cluster probably has very common YSTR marker values that overlap with many haplogroups over 1,000 years old and makes the testing of YSNPs much more important. Some combination of marker values are so uncommon, that all YSTR related submissions will be genealogically related regardless of surnames. For other YSTR related submissions that have extremely common combinations of YSTR marker values, even close YSTR matches with the same surname may not share haplogroups and therefore could not share common genealogical ancestors 300 to 900 years ago. Evaluating the rarity of your YSTR haplotypes is critical to ruling out false YSTR matches and is much more important to analyze than most researchers realize.
YSNPs STARTED GETTING CLOSER TO THE GENEALOGICAL TIME FRAME
Many YSNP haplogroups were now getting discovered in the 1,000 to 2,000 years time and a handful of these YSNPs were recently getting interestingly close to the genealogical time frame. When haplogroups are this close to the genealogical time frame, other major usages of YSNPs become powerful analytic tools for genealogists. If you compare the MRCA haplotype of your more recent haplogroup to the MRCA haplotype of your genetic surname cluster, then you could determine the YSTR "signature" of your genetic surname cluster. These are the mutations between your ancestor when the recent haplogroup originated and the ancestor when your genetic surname cluster originated. When attempting to find out if more remotely related submissions are truly related, matching the YSTR signature (or close to the YSTR signature) is a very strong factor in determining the possibility of being related.
Sharing common mutations of your haplogroup MRCA is a much more important criteria for determining a possible connection than genetic difference (the number of mutations between submissions that FTDNA uses to determine relatedness). If you find other distantly related genealogical clusters that share significnats parts of this YSTR signature, this provides additional evidence that these remotely related genealogical clusters could share a common ancestor. If you have possible NPE candidates that strong geographical ties and that is also genetically close, discovering that they share common mutations from the MRCA of the haplogroup is additional genetic proof supporting the possibility of a NPE connection.
Very few surname admins or sponsors of genetic tests were aware that YSTR signatures of genealogical clusters can greatly enhance genetic source documentation that support genetic analysis. YSTR signatures also provide far superior searches for possible relatives. Searching by only the number of mutations can miss genetically related submissions that mutated a little more than normal. Also, if your genealogical cluster has common marker values where mutational difference is less reliable due to major overlapping unrelated submissions, having common "off modal" mutations from the MRCA of the haplogroup can be a far more accurate test of relatedness. Having shared mutations and close genetic matches is a powerful combination. Having shared mutations, close genetic distance and sharing a common surname is even a more powerful combination. If you only analyze the mutations below the MRCA of your genealogical cluster, you are not including important useful mutations that occurred between the creation of your haplogroup and the creation of your genealogical cluster.
When the age of the haplogroup gets very close to the genealogical time frame (around 1,000 years ago for Irish and Scottish surnames), these YSNP mutations that define the haplogroup can reveal even more information. These YSNP mutations are called "near private" YSNPs by deep ancestry researchers. These mutations are always dominated one or two surnames. If 80 % are one surname, 10 % are a second surname and the last 10 % are spread across 20 surnames, then you have probably discovered very distant NPE connection between two most common surnames. The last 10 % of surnames also become excellent NPE candidates since the number of NPEs over the last 1,000 years should range between twenty to forty percent.
YSNPs ARE NOW ROUTINELY BEING DISCOVERED VERY CLOSE OR WITHIN GENEALOGICAL TIME FRAMES
Just like YSTRs, YSNPs can mutate at any time. Anywhere from 20,000 years ago to only 100 years ago. Any YSNP that mutates within the genealogical time frame is called a "private" YSNP and is extremely important to genealogical research. YSTRs really only form clusters of possible related submissions. The vast majority of YSTR mutations provide only proof that those submissions that include common mutations and must be more closely related. However, the connection between these clusters and the age of these clusters are difficult to determine via only YSTR information. It is similar to tree trunk and several big branches laying on the ground: you can have many branches, but you have little information of where to put the branches on the tree in proper chronological order. In addition, you have a lot of submissions that do not have any cluster defining mutations. These submissions with no common mutations can not be even assigned to a branch and there is no information how they are connected together or where they belong on the tree. Here is a typical YDNA descendancy chart of a well established surname cluster (the most common scenario):
Scenario 1 - Many YSTR branches - but no early branch that splits the cluster
If you are very lucky, you may discover a YSTR mutation that happened just after the formation of your genealogical cluster. These YSTR mutations can form an early branches that divide the genealogical cluster into two large branches and provides genetic evidence of an early branch within you surname cluster. This kind of branch has major genealogical implications as you can eliminate around half of the submissions as being less related and focus your genealogical research on the half of the submissions that belong to your branch. This very special scenario also indicates the mutation happened immediately after the creation of the genealogical cluster. These kinds of cluster dividing branches are rare (five to ten percent of genealogical clusters at most):
Scenario 2 - Several Y-STR branches - with early branch that splits the cluster
During the last year with the availability of numerous NGS test results, "private" YSNPs have become available for genealogical analysis. Very few genealogists are even aware of these extremely powerful "private" YSNP mutations or how to test them. Unlike YSTR branches, "private" YSNPs reveal new branches with more clarity. "Private" YSNPs provide connection information between all branches and provide the relative time frame of each branch. Under L226, we now haver over 300 private YSNPs are not being tested to discover new branches under L226. Only a handful of these "private" YSNPs are probably being analyzed for genealogical purposes. There are currently around 20 to 30 new YSNPs being discovered every month just under L226 and more than half of these are "private" YSNPs (others are equivalents of newly discovered branches or are declared unstable by testing companies. Many scientists believe that there could be 100,000s of "private" YSNPs that could be discovered over the next few years. These kinds of branches will become common and will help create a DNA descendancy chart that starts to resemble a traditional genealogical descendancy chart:
Scenario 3 - Many Y-STR branches - with one private SNP
So how do genealogists gain knowledge about these existing L226 "private" YSNPs associated with their genealogical cluster and how do you test for these YSNP mutations? Finding existing "private" Y-SNPs that match your surname project is pretty tedious work and requires some research in the haplogroup projects. You are very lucky since the prmary purpose of this web site is to first confirm your terminal YSNP by the order of only one individual YSNP from YSEQ. After confirming your terminal YSNP, you can then test for private YSNPs that could reveal additional recent branches under L226.
New L226 private YSNPs are first analyzed by Dennis Wright as he receives BAM files from recent NGS testing. Be sure to request a download link to your BAM files (not the summary spreadsheet files) and forward this link to Dennis Wright. New L226 YSNPs mutations are now being discovered several times per month these days. At this high rate of NGS testing, not only will new branches be discovered but a large quantity of private YSNPs are revealed as well. For those testing positive for terminal YSNPs in the L226 haplotree (this is the most recent branch in the haplotree), testing private YSNPs individually could also reveal new branches at a much lower cost per branch than NGS tests. This web site hopes to document many of these "private" YSNPs and make recommendations for testing of these private YSNPs as well. It is hoped that this web site provides enough analysis to assist L226 researchers to reveal as many genealogical YSNP mutations as possible.
The future of YSNP testing is very exciting
NGS tests reveal the most about your ancestry but are quite expensive to test and may not be economical for many researchers. Currently, there are two offerings that are available: 1) FTDNA's Big Y which costs $575; 2) Full Genomes Corporation which costs 30 % more at $775 but also covers 30 % more of the Y chromosome which should discover 30 % more branches and 30 % more private YSNPs than the Big Y test. The Nat Geo 2.0 test and the CROMO2 test from IrishDNA are no longer recommended as SNP packs from FTDNA and SNP panels from YSEQ are much better products for testing L226 branches recently discovered. Testing of private YSNPs discovered by NGS tests is also highly recommended and this should be done at YSEQ since FTDNA is no longer robustly adding private YSNPs as being requested by L226 researchers or other haplogroup researchers. When analysis of YSTR and YSNP testing is done, testing of individual YSNPs for the known L226 private YSNPs discovers new L226 branches at 10 % of the cost of more NGS tests. However, NGS tests we continue to reveal more and more private YSNPs which is always a superior test.
Discovery of new L226 branches should use a variety of testing offerings in the research of the L226 haplotree. NGS tests from Big Y will always remain the primary test of choice since it is the lowest cost NGS test and each test can reveal new L226 branches as well as many more private YSNPs to test as well. However, L226 researchers should also order at least ten percent of the NGS tests with the higher resolution Full Genomes Corporation test as well. There are two primary scenarios where higher resolution testings are the best options: 1) for those wishing to discovery more genealogical branches under L226, the higher resolution NGS test is preferred - if there has been a previous NGS test that are close YSTR matches; 2) L226 has some major genetic bottenecks along the trunk of the L226 haplotree. Around 30 % of all L226 private SNPs belong to only three of the twenty six branches under L226. If you terminal YSNP is discovered to be FGC5660, FGC5628 or FGC5659, the higher resolution NGS test is preferred only if there has been a previous NGS test that is reasonably close YSTR match.
But NGS tests are not the only offering that can discover new L226 branches. Once NGS tests are completed, any YSTR testers that match the signatures of NGS testers can testing the private YSNPs associated with the NGS testers. The L226 branches of DC69, FGC5647, FGC5639, A6097 and DC19 were discovered via testing individual YSNPs from YSEQ. One of the primary focuses of this web site is to provide testing recommendations to discover new branches via testing of private YSNPs discovered by NGS tests. Another variation of individual YSNP testing is testing of known branches via YSTR signatures can sometimes also confirm your terminal YSNP at 20 % the cost of YSEQ SNP panel tests and 15 % of the cost of FTDNA's SNP pack tests. This web site will initially focus on these testing recommendations but will later expand its focus to include testing private YSNPs to discover new branches.
Other offerings are critical to L226 research as well. These tests will not discover new branches but can economically determine which L226 branch that you belong to. YSEQ offers the Z253 SNP panel test (which currently includes the most of L226 known branches) at $88 per test. FTDNA also offers the Z253 SNP pack test (which includes many of the L226 branches) at $119 per test. At this point in time, YSEQ just upgraded their Z253 SNP panel and it includes more L226 branches at a lower cost. However, each company update their offerings at different times and FTDNA will respond to YSEQ offerings with even better offerings in the future. Each company will alternate back and forth with improved versions as we discover more L226 branches. Even when one company offers more complete testing, the other company's offering may be better suited for you part of the L226 haplotree.
The SNP packs and SNP panel tests do not really discover new branches under L226. They only economically test known existing branches. These tests cost between 10 to 20 % of the costs of NGS tests but will never discover new private YSNPs or discover new L226 branches. But these tests produce very useful genetic information that is used to make better recommendations of testing possible. These more economical tests add more information to our genetic database which is used to determine what test is more appropriate and help reduces overall costs for discovering new L226 branches. With only YSTR information, it is impossible to generate descendant charts of L226 solely based on YSTR patterns. There are just too many parallel and backwards mutations under L226 to yield any degree of accuracy. However, as more YSNP data is added to the mix, parts of the L226 can be manually charted as well as automatically charted with the new SAPP tree generation software.
As more and more L226 branches are discovered, the accuracy of these descendant charts will continue to improve. Additionally, upgrading to 67 markers (or net new 67 marker submissions) will also improve the accuracy of these charts. Testing of SNP pack or SNP panel tests also increases the size our YSNP database which greatly increases the accuracy of these charts. Even individual testing of YSNPs increase the size of our YSNP database and also increases the accuracy of our descendant charts. As the accuracy of our charts increase, the L226 researchers will be able to make better recommendations where NGS tests are most needed, where higher resolution NGS tests are needed, who needs to test SNP pack and SNP panel tests and who would benefit from individual YSNP testing.
In this firt iteration of analysis of L226, only around 25 % of the L226 haplotree can be charted with reasonably high accuracy. But if you are included in this 25 % of L226, then more economical testing options are probably available to you. Currently, the SAPP tool has a limit of around 100 submissions before time-outs from ISP providers break the internet connection. Dave Vance is making major improvements on his SAPP tool which is an extremely exciting development which will greatly increase the accuracy and scope of L226 charts. Although the L226 YSNP can be predicted at 100 % accuracy due to genetic isolation from other haplogroups, branches under L226 do not have the same genetic diversity. L226 has a very robust nine marker signature that distinguishes L226 from all other L21 haplogroups. However, branches under L226 usually only have signatures of two or three mutations off the L226 modal. Plus there obviously wide spread parallel and backwards mutations under L226 under L226 since there has just not had enough time to diverge from the original L226 modal values. Only FGC5647 has a strong signature under L226 with six mutations from the L226 modal.
Between L21 and L226, our ancestors rolled the dice and got eights and nines which created our unique L226 signature under L21. However, there has just been enough time since the creation of L226 to generate strong signatures that make charting possible by YSTRs alone. However, with ever increasing amount of 67 YSTR submissions, the ever increasing number of L226 YSNPs being discovered and the increasing number of YSNPs being tested from numerous genetic offerings, we will eventually be able to produce highly accurate charts of the evolution of L226. There were always be small pockets of submissions where genetic bottlenecks produced too few descendants or too few descendants have tested. But eventually 90 % of L226 should be charted in the next year or two.
|