Bryan and Jordan Casey

Bryan Casey (age two days - left) and Jordan Casey (age two years), reaction to "that's your baby brother," all descendants of Robert Casey, photograph 1989, Dallas, TX
 



How To - YSNPs (Future Trends)

 

THE TESTING OF YDNA HOLDS A BRIGHT FUTURE

The future of YDNA analysis for genealogists is extremely promising and is now evolving at a very rapid pace due to low cost NGS and WGS tests between $500 and $1,000 (just two years ago these tests were $5,000 each). There are believed to be around 400 to 500 YSTRs that could match genealogical needs but currently genetic researchers are so busy analyzing the flood of YSNPs that the next step for YSTRs is uncertain. Over the next few years, YSNPs will grow under L226 from the paltry 25 branches to well over several hundred branches. As new L226 submissions are only growing at a steady 30 % per year, the pace of YSNP discovery is really only limited by the number of new YSTR submissions that join the genetic YDNA community. L226 already has over 350 private YSNPs that have been discovered under L226 in the last year. This massive growth in YSNP testing will provide a wealth of new genetic information to analyze and will introduce some significant growing pains. Currently, there around 450 high resolution 67 marker submissions predicted to be L226 and private YSNPs will soon exceed the number 67 marker YSTR submissions. We also now have around 50 NGS tests completed under L226 which over ten percent of the 67 marker submissions. We also have around 50 individuals that have also ordered either YSNP packs, YSNP panels or individual YSNPs. This is an tremendous amount of information to analyze and tools are pretty limited.

Within the next year or two, the scope of genetic genealogical testing will shift to Whole Genome Sequencing (WGS) tests instead multiple tests for each individual being tested. These WGS tests will eventually decline to below $500 and will replace our leading edge NGS tests since the costs for separation of YDNA from other chromosomes we not decline as rapidly due to smaller demand for YDNA only products. On the other hand, the explosive growth for Whole Genome Sequencing (WGS) is being driven by the medical community will soon have WGS tests that are comparable in price to our current low resolution NGS tests. These WGS are already being ordered for genetic genealogical purposes at Full Genomes Corporation. The WGS will include even higher resolution YDNA coverage and higher quality information due to increases in read lengths which will eventually allow accurate reading of all 111 YSTR markers. These WGS should drop below $500 within a year or two and will include: 1) the best test for YSNP discovery; 2) accurate scans of around 500 known useful YSTR structures; 3) atDNA tests that increase the resolution of current amount of atDNA by 1,000 to 10,000 fold (it is not known if this will improve accuracy dramatically and may be an impractical amount of information to analyze); 4) a full mtDNA test which really has limited genealogical usage; 5) if wanted, a full medical scan of your genes (this will probably only be the raw data as analysis will be expensive and controlled by the medical community); 6) more atDNA AIM markers for much more accurate geographic admixture charts but these have accuracy limitations that will not go away.

You will only have to test once and you have all the tests you need right ? However, improved technogolgy will continue to reveal a little more information with every technological advance - but the pace of expanded YDNA information will be much more limited since coverage is rapidly approaching 100 % of what is believed to be useful for genealogical research. However, the old saying "never say never" applies here as the huge areas that are highly redundant and are currently believed to have little medical or ancestral information encoded may yield even more information as technology moves forward. After we start peaking out YSNP branch discovery, a near term bottleneck will become finding new and willing participants to test YDNA (this has only grown at a steady 30 % per year). There is a growing concern that 67/111 markers and numerous YSNPs will not have enough resolution of information unless one of two things happen: 1) the pace of new YDNA submissions picks up dramatically (which could happen with such good results being produced these days); 2) more YSTRs may be required to sort out those relatively few generations that that include our genealogical brick walls.

The first complete scan of the first person's entire genome (every human DNA marker that exists) far exceeded $10,000,000 in 2004. Just four years later in 2008, many whole genome scans were conducted for less than $1,000,000 per individual. Just one year later in 2009, dozens of genomes were scanned for less than $100,000 per genome. By 2011, hundreds of whole genome scans have been conducted and the cost has been reduced again to under $10,000 per scan. In early 2016, Full Genomes Corporation announced the first Direct to Customer Whole Genome Sequence test for under $1,000. It is predicted that WGS will approach $500 either late this year or early next year. The WGS test will replace our current NGS tests as lower cost and higher fuctionality testing. The costs of future testing for genealogists will be primarily driven by the overhead of delivering this information to genealogists and will require complex software to analyze the huge volume of information being produced. As with most innovative technologies, eventually the IT costs will become the dominant cost factor, with other labor being a distant second and actually testing costs heading towards only 10 or 20 % of total costs. Development of specialized software to analyze the huge volume of data will preclude our spreadsheet approach (atDNA analysis limits are already there).

YDNA testing is in the early stages of the typical technology maturing cycle. Currently, the hardware costs of DNA scanners (and associated "consumable" supplies) are the dominant economic factor and the associated labor costs are not far behind. Currently, the software and software development expenses are in distant third place. Software development costs are currently very limited to simple MRCA calculators, web access to place orders and provide information, relatively small databases (100s of TB) for repositories of YDNA submissions and over simplistic matching systems only based on genetic distance. This mixture of expenses behind testing your YDNA will radically shift over the next few years. YDNA testing technology is currently in the same state of affairs that corporate data processing was around 20 to 30 years ago.

Early in the technology cycle, the costs of running a corporate data centers shifted from hardware related costs to labor costs to support these systems due to hardware costs decreasing at staggering rates of 30 % per year. Labor costs soared due to massive increases in software development to take advantage of massive increases in computing power and massive increases in required information to run a business. Costs for generic software productivity tools greatly increased in order to reduce labor costs by increasing labor productivity. Today, the hardware and maintenance costs of the corporate data center is only a small fraction of overall IT expenses. This same technology maturing cycle will be repeated with the DNA testing industry. Hopefully genealogists will be able to gain a free ride for much of the complex software analysis tools required by the medical industry. The future DNA testing companies will become very dependent on software analysis tools and database analysis tools to analyze the massive amount of data that will become available. During this time of transition, major investments to software development (generic software as well as specialized genealogically unique software) will quickly go over 50 % of the costs of YDNA testing. Costs for FTDNA website, databases and internal analysis tools is probably already over 25 % of the costs even today.

The number of useful YSTR markers will increase to 400 to 500 with the availability of whole genome scans. I can not imagine that genealogists will not taking advantage another 100 to 400 additional YSTRs when they become available at no additional charge. However, the emphasis is shifting from YSTR analysis to YSNP analysis. YSTR markers are relatively fast mutating markers and only produce clusters of related submissions. YSTRs rarely show how all the clusters are connected or the chronological order of each cluster. Fortunately, YSNPs also form branches that have much less ambiguity since they reveal how branches are connected and imply the relative age of each mutation. But the size of YSNP databases will explode to millions of mutations for a small haplogroup project such as L226. The days of spreadsheet analysis will be very difficult to sustain since the growth of YSNP data is becoming staggering. We will need access to hierarchial databases systems which will be quite expensive for our genetic community. We already get most of our data delivered to us as raw format that we crudely massage with spreadsheets. This is already happening with NGS tests as most real YSNP analysis is currently being done via manual spreadsheet along with specialized software that analyzes NGS BAM files (raw data). Fortunately, the specialized software to analyze NGS data is developed by medical research and several good quality programs are available for free via shareware. Better tools are very expensive.

There is another future major limit where YDNA testing will eventually hit another brick wall. YDNA testing only works where there is a reasonable amount of traditional documentation available to assign names, places and specific dates to genetic connections. YDNA testing is a great complementary source for genealogical information in the 300 to 600 year time frame where we can enhance our knowledge of the connections of oldest proven ancestors within several generations of these oldest proven ancestors. Eventually, our genealogical research will approach a time frame where 90 % of the evidence will be genetic and only 10 % of the evidence will be traditional genealogical documentation. We will be able to discover how are distant ancestors are connected - but will probably never be able make this information very meaningful without names, without specific dates and without specific places due to lack of supporting traditional genealogical documentation that provides this information. Of course, there will always be set of lucky individuals that tie into more influential ancestors that left a better paper trail behind. As the genetic genealogical research travels further back in time, new brick walls will become limiting factors again due to lack of any significant amount of traditional genealogical documentation to add any genealogical meaning to our genetic family histories.

As technology greatly enhanced our ability to access, research and document our family histories, YDNA testing is providing a new infusion of source documentation to complement our traditional documentation sources. As it took many years for many genealogists to embrace new computer technologies, it will probably take many years to develop and embrace new YDNA technologies by most genealogists. Previous generations saw other technology improvements that we take for granted today. Modern cars and improved highways allowed us to be more mobile and visit remote courthouses, lower cost long distance telephone service allowed us to call our distant cousins and copiers enhanced our ability to duplicate critical source records to share with others. Improvements in technology of personal computers and the internet databases supporting genealogists will also continue to improve over time but the sheer magnitude of unreliable information continues to explode as well. It is naive to believe that YDNA testing for genealogists will be the last major quantum leap in enhancing our genealogical research. If anyone has any ideas of other near term quantum leaps on the horizon (other than DNA related), drop me a note so that I can start preparing for these new opportunities to learn yet more new analysis skills.