Once a true genealogical cluster has been determined, strong NPE candidates have been added to the cluster and the DNA of the Most Recent Common Ancestor (MRCA) has been estimated, the next step is determine branches within the genealogical cluster based on common mutations. It is very important that the genealogical cluster does not include any unrelated lines as this would pollute the analysis of estimating branches within the cluster. Only very strong NPE candidates should be included in the cluster as including incorrect NPE submissions would also pollute the analysis of the branches. It is also important that the DNA haplotype of the MRCA is properly estimated as any errors in this calculation could also pollute the analysis of estimating branches. Each phase of the analysis is cumulative in nature and any errors in the earlier phases of the analysis result in even more errors in later phases. Having solid well proven documentation of all submissions is also extremely important as this documentation is key in determining the MRCA haplotype and extremely important in determining branches.
As you can tell from the above list of potential sources to pollute the analysis in earlier steps only compound errors in the analysis of the later steps. It is wise to be somewhat conservative in the earlier steps in order to reduce errors in later steps. Analysis is also an iterative process and findings in later steps may result in changes in earlier steps. During each iteration of the analysis, more information is usually available to analyze (more submissions, more complete traditional documentation, upgrades in the number of markers tested, new deep ancestry testing of existing submissions, new strong NPE candidates found, etc.). During each iteration of analysis, the knowledge level of the person performing the analysis continues to improve as the importance of new factors are discovered (rarity of marker values, mutation rates associated with mutations that define possible branches, filtering out recent mutations from analysis, discovering distantly related genealogical clusters that share a common male ancestor over 600 years ago, etc.)
It must be remembered that analyzing DNA is mostly based on the highest probability scenarios and that lower probability scenarios should be expected as more data becomes available to analyze. Many low probability weak branches will be later be disproved and will be pushed down the descendancy chart as being more recent mutations than originally believed. Unlike traditional documentation that clearly reveal relationships that genealogists seek, DNA evidence is based on most likely scenarios and can radically change over time. This level of uncertainty is very frustrating to most genealogists attempting to analyze their newly discovered DNA evidence.
Determining the strength of possible branches is a very difficult to estimate and there is very little guidance on how to properly determine branches within a cluster. Most people who attempt to determine branches within a cluster should include any possible branch (those that are 90 % certain down to only 20 % certain). Inclusion of weak branches is necessary in order to determine what submissions need to be tested next. Weak branches require additional submissions or upgrades in number of markers to firm up. This multiple step process is difficult for most sponsors of DNA submissions to understand and requires a continuous infusion of funds which are not always available.
Determining branches within clusters is by far the best long term usage of DNA testing for genealogists. However, it takes many carefully selected submissions to make branches high probability branches (pass the test of preponderance of evidence). Unfortunately, most submissions are somewhat random in nature and include a lot of bias which is bad for accurately determining branches. There is bias for testing residents of the United States and England and other English speaking countries are not well represented. More testing of residents of Ireland, Australia, Canada, South Africa and New Zealand are needed to eliminate this bias. If you are researching lines that do not have Western European ancestry, the number of submissions for Asian, Arab, Balkan countries, Africa and other parts of the world have very limited coverage to date.
There is also a natural bias to test in depth your own line which produces less results. It is emotionally hard to fund other unrelated lines that are discovered to be genetically related. It is more important to the overall project in finding branches to cover every line evenly vs. having your particular line over tested. Some lines are just more fortunate to have their line documented two or three generations earlier than most in the cluster. Since these lines are proven in an earlier time frame, testing these lines will take many more submissions even though most of the lines are already proven via traditional documentation. Listen carefully to those who analyze your DNA submissions as they are in a much better position to recommend testing that help the overall project goals. Taking a well planned broader testing approach is far superior than thoroughly covering a limited number of lines. This breaks all the rules of traditional research where you concentrate only on the lines that show the most promise and focus less on lines that appear to be lower probability. Data-mining DNA evidence is primarily a top down approach to research where traditional genealogical research encourages a bottom up approach (working hard on proving the next ancestor vs. attempting to tie all lines in the state together).
|