SUMMARY OF SNPs (October, 2014)
There are now three classes of YSNPs that require different analysis techniques:
1) Older L21 YSNPs that have numerous YSTR fingerprints. These YSNPs are generally older than 1,500 to 2,000 years of age. Mike W's L21 spreadsheet in the L21 Yahoo forum is the best place to track the numerous signatures associated with these older YSNPs. The numerous signatures associated with these older YSNPs could be predicted but eventually the younger YSNPs with only one or two fingerprints will gradually replace these older YSNPs as terminal YSNPs except for those unlucky last 5 or 10 % where manual analysis will be continued to be required.
2) The primary focus of this web site will be to continue to analyze those YSNPs that can be represented with one or two YSTR fingerprints. Binary Logistic Regression averages over 95 % accuracy in predicting these YSNPs that are believed to be between 750 and 2,000 years of age. Currently just over 50 % of L21 submissions can be predicted for the around 50 YSNPs that fall into this time range. The percentage of these predicable YSNPs will continue to grow but it is very labor intensive to analyze the large deluge of newly discovered YSNPs. In the near future, 90 to 95 % of L21 will have predictable YSNPs.
3) There is now emerging a third class of YSNPs that are generally 250 to 1,000 years of age that are too young to be predictable via binary logistic regression. These younger YSNPs do not have enough off modal mutations from their parent YSNP (in this case L226) to be reliably predictable with binary logistic regression. New analysis techniques will be required for this time frame. The analysis techniques used will be a hybrid of YSNP analysis as well as traditional surname cluster YSTR analysis. Even though these YSNPs can not be reliably predicted using binary logistic regression (as L226 can be predicted), most of these YSNPs will qualify for the ISOGG haplotree of mankind.
There are now three YSNPs identified that are descendants of L226 that should be currently ordered as individual YSNPs to determine the scope of these YSNPs. Here are the manual steps of recommended testing (some of these can be partially predicted but predicted with much lower accuracy than L226 prediction via binary logistic regression):
1) FGC5628 should be ordered by itself first. This is a very large branch under L226. To date, 17 out 26 completed tests have tested positive. This means that there is currently a 65 % chance of testing positive for FGC5628. Since the other two YSNPs are descendants of FGC5628, 35 % will test negative where additional testing only produce predictable negative results and would be wasteful of testing funds. Your submission is more likely to test negative if your submission has some of the marker values: 449 <= 28, 464d <= 16, 576 <= 17 or 576 >= 19. Your submission is more likely to test positive with 481 >=23 (pretty strong connection), 458 <= 16 or 534 <=14. If you test negative for FGC5628, no further testing is recommended.
2) Only if you test positive for FGC5628, the next YSNP to test would be to test DC1. To date, 3 of 19 NGC5628 positive submissions have tested positive to date. This means that there is only a 16 % chance of testing positive for DC1. Your submission is more likely to test positive for DC1 if your submission is 481 >=23. If you test negative for DC1, no additional testing is recommended.
3) Only if you test positive for DC1, the next YSNP to test would be YFS231286. To date, 2 of 3 DC1 positive submissions have tested positive to date. This means that there is a 67 % chance of testing positive for YFS231286. Since only two submissions have tested positive, YSTR values do not influence the testing.
The analysis employs some pretty speculative methodology that is much less reliable than L226 YSNP prediction. This methodology will evolve and improve over time as we have more testing data and fine tune our analysis methodology for this new class of YSNPs. Here are the assumptions and criteria used for analysis:
1) Only used 67 markers for consistent and more reliable analysis. Only included actual test results and did not use any predicted results to pollute the analysis. Results are from four sources of information: a) Individual YSNP testing at FTDNA (posted in FTDNA YSNP reports); b) results from several FTDNA Big Y tests (not posted in FTDNA YSNP reports); c) results from one Full Genomes Corporation's test (this test has many unique L226 mutations); d) Individual YSNP testing from YSEQ (only four tested to date with no new discoveries).
2) Isolate submissions into groupings based on testing results to date: a) FGC5628 negative; b) FGC5628 positive and DC1 negative or unknown; c) DC1 positive. These groups were isolated in order to determine if any YSTR patterns develop under these groups.
3) With each grouping, locate mutations with most off modal L226 mutations. Minimum of three submissions are required to validate a true trend. If muations overlap, ignore grouping until three isolated mutations are discovered. Once YSTR patterns are established, identify conflicting data where parallel or backwards mutations probably exist. When submissions belong to a well-defined surname cluster - add all submissions of the cluster as predicted for the same YSNP results.
Reliable YSNP prediction under L226
Reliable prediction under L226 is possible for any surname cluster or YSNP cluster that has at least six L226 off modal mutations. None of the current groupings fo tested YSNPs reveal any fingerprint greater than two L226 off modal mutations. There is a strong possibility that further testing of existing YSNPs being analyzed and additional discovey of new YSNPs will eventually reveal several YSNPs under L226 that could be predicted via binary logistic regression.
The South Carolina Casey surname cluster has seven L226 off modal mutations. This surname cluster has tested FGC5628 positive. Binary logistic regression and the genetic isolation of this surname cluster suggests that 95 % or higher of the South Carolina surname cluster would also test positive for both FGC5628. The best matches of non-Casey submissions under L226 are only 3 of 7 except for one that was 4 of 7. Therefore, the accurate prediction associated with this surname cluster is limited only to the South Carolina Casey surname cluster and does not apply to any other L226 submissions (this cluster is very genetically isolated).
The Kennedy surname cluster has five L226 off modal mutations. This cluster is not tested for any of the L226 descendant YSNPs to date. Five mutations provides a pretty marginal fingerprint - but could be more reliable if isolated from other L226 submissions.
There are several surname clusters that include three L226 off modal mutations. Although reliable YSNP via binary logistic regression does not apply for these clusters, their genetic isolation will probably result in higher odds of tracking within the surname cluster. The most reliable of these three marker fingerprints clusters are:
1) Butler (15 submissions) - tested FGC5628 negative.
2) Bryan (5 submissions) - no YSNPs tested to date. Highly recommend testing this cluster.
3) Carey (4 submissions) - tested FGC5628 positive and DC1 negative. Since the Carey cluster off modal mutations are shared with the South Carolina Casey cluster, they are probably related in some fashion.
4) O'Brien (3 submissions) - FGC5628 positive and DC1 positive.
Even surname clusters with only two off modal fingerprints should be tested as well:
1) McCraw (12 submissions) - Not tested. Highly recommend testing.
2) Crow (9 submissions) - tested FGC5628 negative.
3) Casey - Munster (6 submissions) - tested FGC5628 negative.
4) Cannon (5 submissions) - tested FGC5628 positive and DC1 negative.
5) O'Mahony (4 submissions) - Not tested. Highly recommend testing.
6) Shannon (3 submissions) - Not tested. Highly recommend testing.
7) Noland (3 submissions) - Not tested. Highly recommend testing.
YSTR Upgrade recommendations
In order to conduct YSNP analysis, it much more accurate to have a consistent resolution of YSTR testing. It is highly recommended that anyone upgrade to 67 markers prior to testing any newly discovered YSNPs under L226. For those that have tested newly discovered YSNPs but have 37 or fewer YSTR markers tested, it is highly recommended that you next upgrade to 67 markers so that your submissions can be included in future analysis.
Even upgrades to 111 markers could be used in developing more reliable branches within the L226 descendant chart. However, future YSNP discoveries and testing of known significant YSNPs could eventually eliminate the need for higher resolution YSTR tests. Currently, upgrades to 111 markers would be useful - but eventually the deluge of new YSNPs to be discovered under L226 make higher resolution less useful over the coming years. I highly recommend testing YSNPs as a better strategy for discovery.
Parallel Mutations (and a few backwards mutations) - a major issue to consider
It is unfortunate that YSTR marker values have pretty high mutation rates. For YSNPs that are 750 to 2,000 years old, this volatility is great for accurate prediction of YSNP results. Unfortunately, older YSNPs have many more hidden parallel and backwards mutations that are no longer visible and make prediction much less reliable.
Even under L226, more than half of the mutations will be parallel and backwards mutations, so determining branch defining mutations will be challenging. We should constantly identify parallel mutations that are inconsistent with branch defining mutations.
The most dominant branch defining YSNP revealed to date is 481 >= 23. It appears that the following order of mutations look most likely:
1) FGC5628 positive mutated first.
2) 481 >= 23 mutated second.
3) DC1 positive mutated third.
But there appears to be another independent mutation of 481 >= 23 that do not fit the above sequence of events:
Wright (25505) appears to be parallel mutation of 481 >= 23 since it is NGC5628 negative.
Creating future L226 Descendant charts
There was a RCC generated L226 descendant chart that can now be analyzed for accuracy due to recent YSNP testing. For FGC5628 results, there should only be two values as you go down through the chart. It should either start with all FGC5628 negative results and then followed by all FGC5628 positive results or start with all FGC5628 positive results and then followed by all FGC5628 negative. The number of changes should only be one. The number of changes are currently nine changes - indicating eight errors in the earlier branches. This technology obviously has some serious accuracy issues.
My current descendant chart is based only on submissions that have been YSNP tested for the newly discovered YSNPs, so the current descendant chart is pretty limited in scope. Since there are going to more independent parallel mutations (and maybe a few backwards mutations as well) than branch defining mutations, creating any descendant chart would be speculative until more YSNP testing completed and more YSNPs are discovered with additional testing for these new YSNPs as well.
However, there are three areas where some analysis can be done in the future:
1) More and more testing of the current L226 YSNP descendants will provide the most accurate chart.
2) Surname clusters are a bottoms up approach that can reliably group large numbers of submissions where clusters have been YSNP tested. However, some of the surname clusters only based on one or two mutations will be less reliable and may change over time.
3) You can filter out many YSTR mutations with fairly high accuracy. Those mutations with very low counts under L226 will probably not be major early branches (but there could be a few exceptions). Also, mutation defining fingerprints of surname clusters as well as other recent mutations found in surname clusters can be filtered out.
4) You can also determine the YSTR mutations that most likely to be major early branch defining mutations based on higher L226 off modal YSTR mutation counts. Some of these may be two or three parallel mutations giving the appearance of major branches when they are not real branches. The following marker values are the best candidates (out of currently 357 submissions):
a) 576 >= 19 (61)
b) 481 >= 23 (50) - appears two be two major parallel mutations
c) 439 >= 12 (48)
d) 534 >= 16 (44)
e) 458 >= 18 (41)
f) 446 >= 14 (40)
g) 464d >= 18 (39)
h) 607 <= 14 (39)
i) 390 >= 25 (35)
j) 460 <= 10 (33)
k) 442 <= 11 (32)
l) 576 <= 17 (31)
The above YSTR markers should be statistically more likely to be major branch defining mutations under L226. Preliminary YSNP testing combined with surname cluster groupings reveal the following results to date:
a) 460 <= 10 (25)
b) 481 >= 23 (32) - first mutation (FGC5628 negative)
c) 458 <= 16 (17)
d) 464b >= 14 (17)
e) 481 >= 23 (15) - second mutation (FGC5628 positive)
f) 444 <= 11 (15)
g) 393 <= 12 (13)
h) 449 >= 30 (13)
i) 460 >= 12 (13)
j) 534 <= 14 (13)
k) 449 <= 28 (9)
l) 391 >= 12 (6)
m) 413a <= 21 (6)
n) 534 <= 14 (6)
Once more YSNP testing is completed, we may be able to speculate on the earlier branches of the L226 descendant tree. At this point time, there just are not enough testing of YSNPs or enough testing of well defined surname clusters to reliably make a chart yet.
There has been massive progress on M222 with M222 YSNP descendants. M222 is L21's largest single fingerprint YSNP and their progress is extremely encouraging. M222 now has 36 branches defined under M222 along with another 15 duplicate YSNP associated with M222. M222 is probably five time larger than L226, but their progress is very encouraging.
The two testing options issue
We currently have only two major approaches for testing which is not the most economical way of testing. We continue to order full Y chromosome tests that remain pretty expensive (both Big Y and Full Genomes) and we can order individual YSNPs from either FTDNA or YSEQ. FTDNA is now pushing back on adding many more individual YSNPs since their individual testing has an upper limit of YSNPs that can be tested with their current technology being used - so we may be forced to YSEQ in the near future. Also, many people are submitting many Big Y discovered YSNPs but few orders are being placed since the scope of YSNPs is not known - adding costs for setup but generating minimal revenue for FTDNA.
A third set of tests, many haplogroup tests was promised by FTDNA when the Big Y was announced but remains unavailable. FTDNA recently announced a M222 panel test (not yet available for order) in response to the popular M222 YSEQ panel test. Since all recently discovered YSNPs can no longer be tested on one chip - static tests such Nat Geo 2.0 and Britains DNA CROMO 2.0 tests can no longer cover all discovered YSNPs. We will still need one global ecoomical test that covers the ISOGG haplotree branches as well as many promissing candidates as well. It is very doubtful that the extremely useful L226 private YSNPs will be included in any genome wide test that Nat Geo 2.0 or CROMO 2.0 will provide. FTDNA has announced a Deep Clade 2.0 test but is likely only to an updated version of Nat Geo 2.0.
We sorely need a L21 static test that tests 10,000 recently discovered private YSNPs for under $200 per test. This test would include all recently discovered L226 private YSNPs so that we could economically test all L226 private YSNPs and enjoy the volume discounts that sharing one test with all L21 researchers. This kind of test is sorely needed versus testing one YSNP at a time for $39 for each YSNP. L226 is now approaching 500 private YSNPs that could reveal many very interesting genealogically significant YSNPs if included in a static L21 haplogroup test for under $200.
Fortunately, YSEQ is providing group discounting for multiple YSNP tests ordered together via panel testing. This is only a partial solution as this test is limited but it is a very good first step towards haplogroup testing:
1) M222 downstream test of 24 YSNPs for $88 = $3.67 per YSNP
2) L1335 downstream test of 16 YSNPs for $88 = $5.50 per YSNP
3) L21 global test which is a two part test for $88:
If DF13 negative (12 YSNPs) = $7.33 per YSNP
If DF13 positive (19 YSNPs) = $4.63 per YSNP
4) Z251 downstream test of 16 YSNPs for $239 = $14.94 per YSNP
I have worked with YSEQ and they are now putting together a L226 panel test for $88. Here is the prelininary list of YSNPs to be tested:
Step 1) Test L226, FGC5628 and DC1.
If all three are negative, no more testing.
Step 2a) If FGC5658 negative (all Wright private YSNPs):
Test FGC12280, FGC12290, FGC12292, FGC12294, FGC12295 and FGC12296.
Step 2b) If FGC5628 positive and DC1 negative:
Test FGC5639 (Casey), DC15/7765005 (Anderson), DC16/9443821 (McMahon), DC19/8545140 (Cannon) and DC21/15167648 (Riel) .
Step 2c) If DC1 positive
Test YFS231286, DC11/9648265 (Dunn), DC12/15361402 (Dunn) and DC13/15974550 (Dunn).
In the first pass, this test will test three YSNPs: L226, FGC5628 and DC1. In the second pass, it will test 15 additional L226 private YSNPs including YFS231286 in the second pass. For only $88 (only $4.89 per YSNP) vs. $39.00 per YSNP when testing individually at FTDNA. Since this test is two steps, the YSNPs that would test negative are not tested saving YSEQ and YSEQ customers unnecessary testing. With the testing of fourteen new private YSNPs, we will be able to discover new L226 branches at much faster pace than Big Y and FGC testing alone. At $88 per test, this test is only $9,00 more than testing FGC5628 and DC1 at FTDNA one YSNP at a time - plus you also test for 15 more private YSNPs that should reveal several new branches vs. waiting for expensive Big Y and FGC test results to slowly roll in.
VERY PRELIMINARY L226 DESCENDANT CHART
This chart is speculative in nature and will change as more testing is completed:
1) L226 Positive.
1a) FGC5628 Positive (65)
1b) 460 <= 10 (25)
1c)
Several Surnames (8)
1d) 413a <= 21 and 534 <= 14 (6) - Casey (Munster) cluster
1b1) 444 <= 11 and 481 >= 23 (15) - Butler cluster
1b2) 449 <= 29 (9) - Crow cluster
1b3) O'Dea (1)
1a1) 481 >= 23 (53)
1a2) 439 >= 12, 458 >= 19 and 534 <= 16 (5) - Cannon cluster
1a3)
Several surnames (7)
1a1a) DC1 Positive (21)
1a1b) 458 <= 16 and 464b >= 14 (17)
1a1c) Hogan cluster (9)
1a1d) Several Surnames (6)
1a1a1) Dunn (1)
1a1a2) YFS231286 Positive (20) - O'Brien cluster
1a1b1) Carey cluster (4)
1a1b2) 393 <= 12, 449 >= 30, 460 >= 12 and 534 <= 14 (13) - Casey (SC) cluster
|