SUMMARY OF TESTS USED BY GENEALOGISTS
This section gives a very brief overview on how DNA testing is used to extract more source documentation about our ancestors. Y-DNA topics are primarily covered in this section as these tests present the best alternative for DNA testing to connect our ancestors together. Both Y-STR testing (recent mutations) and Y-SNP testing (deep ancestry) are both used for the analysis in most DNA Surname Projects. There are a myriad of other DNA tests available but other DNA tests either address too deep ancestry (mtDNA) or too recent ancestry (Family Finder and X-STR testing). There are also tests for ethnicity and traditional paternity tests for very recent adoptions or out of wedlock scenarios.
The mutation rate (or net change of DNA per generation) is extremely critical on why certain tests are selected for different audiences. Below is a very high level summary of how DNA tests are selected for different purposes:
Audience |
Mutation Rate |
Tests |
Comments |
Not used since there is very little variation |
Almost never |
Full genome |
99.9 % of human DNA is the same for all of mankind |
Medical Community |
Varies a lot |
Full genome |
Sections called genes affect health and biological functions |
Deep Ancestry research |
Usually Once |
WTY - Partial Y-chromosome |
Find new Y-SNPs mutations for deep ancestry research |
Deep Ancestry research |
Usually Once |
Y-SNP, mtDNA |
Develop descendant charts of man |
Many Genealogists |
Usually Once |
Y-SNP |
Private Y-SNPs happen near or within the genealogical time frame |
Most Genealogists |
Very Frequent |
Y-STR |
Help genealogists break brick walls |
Recent adoptions and recent pedigree charts |
Extremely
Frequent |
Family Finder,
X-STR |
Recent pedigree brick walls & adoptions (4 or 5 generations back) |
The vast majority of Y-DNA markers (99.9 %) almost never mutates within all of mankind. These "almost never" mutating markers do not mutate often enough to assist genealogists. A significant part of our variable DNA are in sections labeled as genes. Genes control our biological functions and variations can be extremely important to our health or determine the color of our hair. Most genealogical DNA testing companies do not test for these markers (or will filter out these markers) as they are not useful for genealogical purposes and are not consistent with health privacy issues shared by most of the genealogical community. Other DNA testing companies intentionally test and analyze genes for medical health indicators.
The next class of mutation rates are the "usually once" markers. These are excellent markers to separate all of mankind into hundreds of surviving branches. These tests also allow researchers to estimate ancient migration patterns of mankind over the last 1,000 to 100,000 years. Ongoing deep ancestry research is adding ten or twenty new Y-SNP deep ancestral branches every month. These tests can be very useful for genealogists in the initial separation of submissions into groupings. If any two submissions do not have the same deep ancestry at 1,500 years ago, they obviously can not be related in the last 300 to 900 years (time frame of interest for genealogists). Many of these Y-SNP mutations are known as "private" SNPs as they are too recent to define deep ancestral branches and are now being used by genealogists for more recent connections. Y-SNP (male) testing and mtDNA (female) testing are perfect for this ongoing deep ancestry research. Unfortunately, mtDNA is a very small DNA structure where the DNA strand is only 16,000 base pairs long. However, the Y-DNA chromosome has 58,000,000 base pairs which will reveal many more recent branches due to a much larger amount of DNA that is available for analysis.
The current sweet spot for genealogists are those markers that mutate every 500 years on the average. This mutation rate is ideal for genealogists and when tested with 67 marker resolution, any related pair of submissions will usually have two to four mutations after 300 years. There are believed to be around 300 to 400 Y-STR markers that mutate in a manner that is useful to genealogists. The combination of Y-STR and Y-SNP testing will allow genealogists to extend their pedigree charts by connecting currently unrelated lines together. However, the combination of mtDNA and X-DNA testing will never have the same power since inheritance of X-DNA is not preserved from mother to daughter over the generations. A large portion of Y-DNA is not recombinational in nature and is passed from father to son with little variation over time. All daughters receive one X-DNA strand from their father and randomly only one of two X-DNA strands from their mother. Therefore, for every mother to daughter transmission, X-DNA will lose 50 % of X-DNA of the mother that is replaced by X-DNA from the father. Similar to recombination, X-DNA has an effective 50 % mutation rate.
Another new test for genealogists (autosomal) will help solve more recent brick walls that some genealogical researchers face. The FTDNA "Family Finder" test employs DNA markers that are not tied to our sex and that are passed to all children. Autosomal DNA "re-combines" from the DNA of both the mother and the father at a 50 % rate. Each generation results in 50 % change to the children's DNA - half from the mother and half from the father. Therefore, every generation receives a 50 % change in DNA. Each child ends up with around 25 % of the DNA of their grandparents. Even though this "recombination" of DNA from father and mother is not really a mutation, it has an equivalent of having a 50 % mutation rate. Several hundred thousand of rungs of DNA are tested since the equivalent mutation rate is extremely high. After four or five generations, this test may reveal less information about any relationships since so little common DNA remains. There are exceptions beyond five generations, but only around ten percent of your recombinational DNA will be in tact at six or seven generations. This results in 90 % of ancestors being untraceable after six or seven generations - but a random 10 % of you DNA will remain intact enough to trace some of your ancestors.
There are two characteristics of Family Finder tests that are useful for genealogists. The first is the raw amount of common DNA found between two submissions. Siblings will share around 50 % of their DNA, first cousins will share 25 % of their DNA. Therefore, the percentage of shared DNA can approximate the degree of the relationship. Since the split is not exactly 50 / 50 every generation, after 4 or 5 generations, this characteristic of the test becomes less reliable. This DNA recombines leaving very long strings of DNA intact. Siblings may have hundreds (or thousands) of consecutive rungs of the ladder shared with each parent. These long strings become shorter and shorter for every generation as the recombination process is random in nature. However, the longer strings of common DNA that remain intact over time reveal short lived DNA fingerprints that can be used by genealogists to locate ancestors at 4 and 5 generations with high probabilities and locate ancestors at 6 and 7 generations with low probabilities. These tests are also very useful to genealogists with very recent adoptions where researchers are attempting to locate possible close relatives. They are also useful by researchers that have recent missing ancestors in their pedigree chart due to poor availability of source documentation or those just starting out with their genealogical research.
Most genealogists are attempting to connect unrelated lines together at eight to twelve generations and will be primarily testing Y-STR and Y-SNP markers. This could result in a much smaller database of Family Finder submissions. But many genealogists may also be missing more recent branches of their family history, so it is uncertain how popular this test will become. As prices fall, interest in the Family Finder tests will increase and could assist in locating missing Y-DNA donors or just adding missing third and fourth cousins to your family history. The major advantage of autosomal tests is that these tests cover all your pedigree chart vs. just one all male line. The major disadvantage of autosomal tests is that usefulness of these tests drop off radically from four to eight generations just when most of our pedigrees need genetic information to break major brick walls. Family Finder will not be extremely useful for adding to our pedigree charts or connecting our oldest proven ancestors as the vast majority of clusters being analyzed by genealogists are looking for connections from eight to twelve generations ago. Since Family Finder really addresses a much more recent different time frame but much more broader portion of our pedigree chart, it will be interesting how many DNA projects will embrace these tests that do not meet the objectives of most DNA Projects which is to add more generations to well established pedigree charts.
VARIATIONS IN TYPES OF Y-DNA MARKERS
Y-DNA strands are like extremely long ladders. The actual DNA strands are more complex than this but the ladder analogy allows researchers to visualize how DNA changes from one generation to another. Each rung of the ladder has a position number assigned. The marker number refers to the location of any rung on the ladder. The number of rungs on this ladder is very predictable but there are regularly extra rungs or missing rungs on this ladder. Because some series of rungs never change over time, these series of rungs are used as reference points (primers) to locate other marker positions. Y-STRs are sections of Y-DNA that have strings of rungs that repeat multiple times. These repeating strings have the same chemical makeup and the number of these strings can vary over time (randomly adding and deleting strings). Y-STRs are ideal for genealogists if the variation is predictable and rate of change is not too frequent.
Since they both add and delete strings randomly over time, they can return to their original values (backwards mutation). Since Y-STRs mutate a fairly high rate, it is not uncommon to discover multiple independent mutations within the genealogical time frame (parallel mutations). There are limits to the usefulness of Y-STR tests which are limited to between 400 and 800 years from the present. Many Y-STRs are just too short (under eight repeats) to be reliable for genealogical usage. Other Y-STRs mutate extremely fast and are too volatile for genealogical usage. Also, there are only around 400 Y-STRs that could be useful to genealogists. Many scientists believe that 111 Y-STR markers may be an upper limit that can be safely used with accuracy and they also believe that the accuracy of Y-STR analysis is limited to only 400 to 800 years. Only certain Y-STRs mutate in a manner that are useful to genealogists and these useful Y-STRs can be thousands or millions of rungs apart which drives up the cost per Y-STR marker to scan these kinds of markers.
There are three major variations of DNA that are of interest to genealogists and these different types of DNA are mixed throughout the length of all DNA strands. Some areas of each DNA strand are "rich" in certain kinds of DNA types while other areas are very mixed. Y-SNPs and mtDNA SNPs are "almost once" type of mutations, Y-STRs are fast mutating markers (around 0.2 % per generation) and autosomal markers re-combine at a 50 % net change in DNA every generation. These three DNA types are mixed throughout the DNA strands and provide different kinds of information to the genealogists. A fourth kind of DNA known as genes are other special strings of rungs that affect biological functions are not tested for genealogical purposes. This section primarily discusses Y-STR markers which is currently the most interesting marker type to test. However, Y-SNP markers are rapidly becoming more important to genealogists and in only a few years could become more important to genealogists than Y-STR markers. However, extensive Y-SNP testing requires a ten fold reduction in DNA scanning charges which is predicted in the next two or three years.
The Y-STR markers have a variable number of strings of rungs between known fixed series of rungs. The fixed string of rungs are used a reference points (primer) to locate the variable portion of the ladder. Only the variable portion is reported for random changes over time. The variable portion of the ladder include a repeated number of strings of rungs. The number of these variable strings of rungs can vary from 8 to 45 strings depending on the Y-STR marker position being tested. This DNA has no known biological function (known as junk DNA) and the random changes over time allows genealogists to determine possible relationships between our oldest proven ancestors of what were previously unrelated lines. There are unfortunately several kinds of Y-STR sequences that need to be understood as biology is more complex than our simple genealogical needs. For Y-STR testing, there are five major variations of Y-STRs that need to be understood.
The first Y-STR variation is called a multi-copy marker where the strings of rungs randomly switch positions within a certain defined area. These markers are listed in low to high order of the number of repeated strings as it is not possible to identify which Y-STR sequence is mutating. These multi-copy markers and always have a small alphabetic letter appended at the end of the marker number (ie., CDYa and CDYb or 464a, 464b, 464c and 464d). Multi-copy markers not only have this variable switching of positions but multiple copy markers also have two other variations.
A second type of Y-STR variation is another multi-copy only variation. Multi-copy can also sometimes have extra Y-STR sequences added (extra sequences of strings of rungs). Y-STR marker 19 and 464 are the most common multi-copy marker where extra Y-STR sequences can appear for many generations. This is usually a temporary DNA fingerprint as these extra Y-STR sequences tend be deleted after a several generations. These extra Y-STR sequences can last only a few generations or can persist for many generations.
A third type of Y-STR variation is yet another variation of multi-copy markers. This variation can occur when the chemical makeup (GATC) of these rungs can also change as well. This requires a special FTDNA test to reveal this special variation. Due to the myriad of changes possible, multi-copy markers can provide a wealth of information but are much more complex to analyze. There is discussion that these complex multi-copy markers may be excluded in the future and could be replaced by less complicated Y-STR markers as they become available over time. However, genealogists are unlikely to throw out information any time soon as it is not in our nature to ignore any information. Nature provided us with a lot of these types of markers, so we may have to deal with these more complex variations for some time.
A fourth type of Y-STR variation can happen when the entire Y-STR sequence is deleted. This means that the entire variable portion of the Y-STR gets deleted. When a missing Y-STR sequence is discovered, the marker value will be reported as null or zero. These missing Y-STRs are usually short lived DNA fingerprints and are somewhat rare. Over several generations, these missing Y-STR sequences will always reappear. Missing Y-STRs only happen to regular Y-STR markers that have no suffix attached. These short lived missing Y-STR sequences provide a very unique temporary DNA fingerprint that can last for a few generations or many generations. This kind of variation in less one percent of the submissions but are very unique when they happen providing a very unique DNA fingerprint while the Y-STR deletion persists from generation to generation.
A fifth type of Y-STR has two variable portions within one fixed region. These Y-STRs have 1 and 2 added at the end of the marker number to indicate the number of mutations in each of the two variable sections of the Y-STR sequence. For some reason, the scientists count the number of strings in the first part as the first marker value and the number of strings in the entire Y-STR as the second marker value. This creates an analysis issue for genealogists who are counting mutations. If the first section of the rung shows one added string, the reported value for the entire rung would increase by one as well - even though there is only one mutation. Genealogists have to adjust the number of true mutations based on this duplicate counting of the same mutation. For analysis purposes, many genealogists (and specially deep ancestry researchers) modify the second marker value by subtracting the first marker value. This results in no mutations being counted twice and helps more accurately estimate time frames between submissions.
UNDERSTANDING IMPORTANT FACTORS IN DNA ANALYSIS
The analysis of DNA submissions can not be summarized in a few paragraphs and has some very complex issues. For those just starting out, you should rely on others to get you up to speed. There are many misconceptions and some very high expectations associated with DNA testing. DNA analysis is primarily an exercise in mathematics - probability theory, statistics, logic, pattern recognition, rules of thumb, etc. Having similar DNA values with only 37 or 67 markers does not always equate to being related (some groupings of DNA submissions have such common DNA values that they can overlap with many other lines). The opposite is also true, having five or six mutations does not always mean that the two submissions are not related as well (both submissions may have just beat the odds and randomly mutated more than the average). Finding similar DNA values between two submissions is a positive sign that submissions may be related but it is never a certainty of a close relationship.
There are many factors in analyzing Y-STR DNA submissions - much more than seeing how similar DNA submissions are. Finding similar DNA only affects the initial phase of the analysis of any grouping of related submissions. Once this grouping phase is completed, the actual mutations become the primary focus of the analysis. There is way too much focus on comparing the number of mutational differences between two submissions which is not a reliable comparison. You really need at least five to ten submissions (usually the same surname) to start any meaningful analysis. Comparing only two submissions is similar to rolling the dice once and expecting the number of seven every time. Having similar DNA (few mutations) is regularly not enough to determine how closely related DNA submissions will be. This kind of comparison is only reliable if your submissions have relatively rare DNA marker values which does not happen that often. The combination of similar DNA (few mutations) and similar surnames (with surname variations allowed) is a powerful combination that must be used together. The surname must be used as a filter for similar DNA unless your DNA marker values are rare. For some common surnames that have many genetic origins, even the surname / similar DNA combination may not be reliable.
The rarity of Y-STR DNA values of the submissions is being discovered to be a very important parameter of the DNA analysis. The more markers that have rare values - the higher the quality of the DNA fingerprint which can be used to separate your genealogical cluster from other clusters. There is also a huge variation in the rate of mutations for each marker and mutation rates affect how reliable connections may be. Very fast mutating markers reveal a lot more information as they mutate more often but these fast moving markers can also introduce parallel mutations within the same genealogical cluster (the possibility of two independent mutations to the same marker value in the same cluster). Even the origins of each surname now play a vital role in analysis. Very rare surnames may only have one or two genetic origins, clan based names may have only a handful of genetic origins and common surnames can have dozens (or even hundreds) of genetic origins.
The deep ancestry (Y-SNPs) of each donor can also reveal a lot about the grouping of submissions and has a major impact on genealogical analysis. Haplogroups (deep ancestry) are excellent resources to quickly separate submissions into related groupings. Obtaining the MRCA haplotype of your haplogroup is also very useful as well. Comparing the MRCA of your haplogroup to the MRCA of your genealogical cluster defines "off modal" mutations from your haplogroup. These "off modal" mutations provide a DNA fingerprint of your surname cluster and can be used to help define the MRCA haplotype of your surname cluster. Knowing the DNA fingerprint of your surname cluster also is an excellent filter that can validate NPE connections (related lines with different surnames). Y-SNP testing is also rapidly approaching the genealogical time frame and there are now over 150 "private" SNPs that can be used for genealogical research. The number of "private" SNPs is increasing at a rate over 10 % per month and the rate is on the rise. The "private" SNPs may be only 200 to 800 years old and can be very useful to genealogists.
DNA is only available from living donors and DNA analysis attempts to estimate the DNA of our ancestors based on what was passed down to their descendants. Multiple submissions of every well proven line will be required in order to safely assign mutations to the time frame near our oldest proven ancestors. Only around 25 % of mutations found within any surname cluster will be genealogically significant. Analysis must separate genealogically significant mutations (those very close to oldest proven ancestors) from recent mutations (those mutations that happened in recent times near the donor). It has to be remembered that DNA submissions are not the DNA of their oldest proven ancestor - they are the DNA of the donor who is a distant descendant of our oldest proven ancestors. Mutations occur randomly through time and can happen at any time frame - anywhere from the donor of the DNA to grandfather of our oldest proven ancestor. Multiple submissions of the same oldest proven ancestor can reveal the time frame of these mutations.
DNA analysis can not be done without well proven traditional documentation. Mutations of donors are translated to our earlier generations based primarily on traditional documentation. It is extremely important to be realistic about the probability of speculation being presented as well proven connections. Any speculation of traditional documentation that is treated as fact can pollute the analysis of the DNA submissions. A lot of speculative connections can be supported or rejected based on DNA evidence as well. The volunteers who analyze DNA submissions can barely keep up with DNA analysis and should not be expected to get involved in solving traditional documentation issues of lines that are not even related to their lines. However, these volunteers have an obligation to refute speculative connections based on DNA evidence which is regularly done. Many speculative ancestries have been refuted by DNA evidence as well as many speculative connections have been strengthened by DNA evidence.
|