Instructions for use of the linked Excel spreadsheet
to calculate ages of subclades using the variance method
Tim Janzen M.D.
12367 SE Ridgecrest Rd.
Portland, OR 97236
E-mail: firstname.lastname@example.org or email@example.com
Web site: http://www.timjanzen.com
I would like to thank Ken Nordtvedt for posting information about the variance method on the Rootsweb Genealogy-DNA list and for helping to explain some of the nuances of this statistical method to me.1
The accompanying Excel spreadsheet has been designed to calculate the ages of Y chromosome DNA subclades and haplogroups using the variance method. I have formulated the spreadsheet so that it will handle up to 3000 haplotypes and up to 100 STR markers per run. The spreadsheet calculates the ages of subclades using John Chandler’s mutation rates for the first 37 markers in the Family Tree DNA panel and using Leo Little’s mutation rates for the last 30 markers in the 67 marker panel.2 3 I used 30 years as the generation interval in calculating the ages. If you find any problems with how the spreadsheet functions let me know.
Instructions for use of the spreadsheet:
- Download the Excel spreadsheet from this web site.
- Open the spreadsheet with Excel on your computer.
- Click on the “DataEntry” tab.
- Paste any number of Y chromosome STR haplotypes (up to 3000) into the spreadsheet. Paste the haplotypes into the spreadsheet such that the first marker value in the upper left portion of your spreadsheet is pasted into cell B2 in this spreadsheet. You may include the names of the markers on row 1 if you so desire. Bear in mind that all of the haplotypes must have 12, 37, or 67 marker values each and that the markers must be in the order that FTDNA uses. Refinements could be made to the spreadsheet so that it would handle 43 marker haplotypes from the Sorenson Molecular Genealogy Foundation, DNA Ancestry, or other companies. Mutation rates would need to be added for the markers these companies use. Mutation rates would also need to be added for anything more than 67 markers.
- The age in years and in number of generations for any set of haplotypes will be found in the spreadsheet on lines 3281 to 3297. These same calculations also will appear under the “Results” tab in the spreadsheet.
The use of variance method produces ages of subclades that are only as accurate as the underlying data. The larger the sample size the more statistically significant the resultant age will be that is generated by the spreadsheet. This method may produce ages that are lower than they actually should be. Potential causes for this could include genetic bottlenecks, rapid expansion of some but not all populations of a specific subclade in recent millennia, over sampling of subsets of the population of that subclade, and/or other factors.
It is particularly interesting to note that the ages calculated when using 37 or 67 marker haplotypes are consistently lower than those calculated from using only the first 12 markers in the FTDNA panel. This suggests that the mutation rates that Chandler and Little calculated for markers 13-67 could be inaccurate.
In any case, I think that this methodology shows great potential for helping us learn more about the ages of various subclades. The method could be refined, possibly by selecting only diverse haplotypes for study and by using more accurate mutation rates (at least for markers 13-67).
Also see this version of the Excel spreadsheet which includes 16 haplotypes using 37 markers as an example of how the spreadsheet is designed to function..