DNA Testing for Genealogy: How It Works and What It Reveals

DNA testing has become a foundational tool within the genealogical research sector, enabling identification of biological relationships, ancestral origins, and migration patterns that documentary records alone cannot confirm. As of 2024, more than 50 million people have taken a consumer genetic genealogy test through companies such as AncestryDNA, 23andMe, FamilyTreeDNA, and MyHeritage DNA, according to the International Society of Genetic Genealogy (ISOGG). This page provides a structured reference to the mechanics, classifications, regulatory landscape, and practical considerations that define DNA testing as a genealogical service category.

Definition and Scope

DNA testing for genealogy refers to the laboratory analysis of an individual's deoxyribonucleic acid to identify genetic markers that indicate biological kinship, ethnic admixture, and deep ancestral lineage. The field operates at the intersection of molecular biology, population genetics, and genealogical research methodology, and it is distinct from clinical genetic testing used for medical diagnostics.

Three principal test types serve the genealogical sector: autosomal DNA (atDNA), Y-chromosome DNA (Y-DNA), and mitochondrial DNA (mtDNA). Each test interrogates a different portion of the genome and answers different genealogical questions. A detailed comparison of these categories is available at Autosomal DNA vs. Y-DNA vs. Mitochondrial DNA.

The scope of DNA testing extends beyond simple ethnicity estimates. It encompasses cousin matching through shared DNA segment analysis, haplogroup assignment for deep maternal and paternal lineage tracing, and triangulation of shared segments to identify common ancestors. DNA evidence is recognized under the Genealogical Proof Standard as one category of evidence that, when correlated with documentary sources, can resolve questions of identity and kinship.

The regulatory environment for direct-to-consumer (DTC) genetic testing in the United States is shaped primarily by the Federal Trade Commission (FTC), which oversees advertising claims, and by state-level genetic privacy statutes. Maryland and New York, for example, impose restrictions on DTC genetic testing that differ from the majority of states. The Genetic Information Nondiscrimination Act (GINA) of 2008 (Public Law 110-233) provides federal protections against discrimination based on genetic information in health insurance and employment, though GINA does not cover life insurance, disability insurance, or long-term care insurance.

Core Mechanics or Structure

The biological foundation of genealogical DNA testing rests on inheritance patterns. Every individual inherits approximately 50% of autosomal DNA from each biological parent, though the specific segments inherited vary due to recombination during meiosis. Over generations, the total amount of shared DNA between relatives decreases in a roughly predictable pattern: full siblings share an average of approximately 3,400 centimorgans (cM) of autosomal DNA, first cousins share roughly 850 cM, and second cousins share approximately 212 cM, according to data from the Shared cM Project maintained by the DNA Painter tool and Blaine Bettinger.

Sample collection and genotyping. DTC genealogical testing uses a saliva or cheek-swab sample collected at home and mailed to a genotyping laboratory. The standard platform used by most genealogical testing companies is a microarray chip — a small glass slide containing hundreds of thousands of known SNP (single nucleotide polymorphism) positions. AncestryDNA's chip, for instance, tests approximately 700,000 SNP positions. This is not whole-genome sequencing; it samples specific points known to be informative for ancestry and kinship analysis.

Ethnicity estimation. Reference panels — curated databases of DNA samples from individuals with deep, documented roots in specific regions — serve as the comparison set for ethnicity estimates. Each company maintains a proprietary reference panel. AncestryDNA's panel comprised more than 68,000 reference samples across 84 regions as of its 2024 update. The algorithm assigns probability-weighted percentages by comparing an individual's SNP patterns against these reference populations.

Matching and segment analysis. When two tested individuals share identical-by-descent (IBD) DNA segments above a minimum threshold — typically 7 cM or higher — the testing platform reports them as genetic matches. The total shared cM and the number and size of shared segments help estimate the likely biological relationship.

Understanding how DNA testing fits into the broader genealogical research framework — alongside documentary analysis, oral history, and record correlation — is covered at How Family History Research Works: Conceptual Overview.

Causal Relationships or Drivers

The rapid adoption of genetic genealogy has been driven by three converging factors: falling laboratory costs, expanding match databases, and high-profile public cases.

Cost reduction. The cost of genotyping a human DNA sample on a microarray dropped from over $300 per test in 2010 to under $100 at retail by 2019. This price decline mirrors broader trends in genomics; the cost of sequencing an entire human genome fell from roughly $100 million in 2001 to under $600 by 2022, as tracked by the National Human Genome Research Institute (NHGRI).

Database network effects. The utility of a DNA test increases as the testing company's database grows, because larger databases produce more matches. AncestryDNA, with a database exceeding 25 million test-takers, generates substantially more match results per individual than a smaller database. This network effect has concentrated the consumer market among a small number of major providers.

Forensic genealogy and public awareness. The 2018 arrest of Joseph James DeAngelo — the Golden State Killer — using investigative genetic genealogy (IGG) through the public GEDmatch database generated significant media coverage and public interest. Law enforcement use of genetic genealogy has since led to identifications in more than 500 cold cases, according to the DNA Doe Project and reporting by the Department of Justice.

Adoption and unknown parentage research. DNA testing has become a primary tool for individuals seeking biological relatives in cases of adoption and biological family research or unknown parentage. Before genetic testing, these searches depended almost entirely on sealed court records and intermediary services.

Classification Boundaries

The genealogical DNA testing sector is distinguishable from adjacent fields by its purpose, methodology, and regulatory treatment.

Genealogical vs. clinical genetic testing. Clinical tests analyze specific genes for medically actionable variants (e.g., BRCA1/BRCA2 for breast cancer risk) and are regulated by the FDA under clinical laboratory improvement provisions. Genealogical DNA tests are classified as consumer wellness products and are not subject to the same FDA oversight, though 23andMe obtained FDA authorization for certain health-related reports beginning in 2017.

Genealogical vs. forensic genetic analysis. Forensic DNA testing, used in criminal investigations, relies on short tandem repeat (STR) analysis at 20 core loci specified by the FBI's Combined DNA Index System (CODIS). Genealogical testing uses SNP-based microarray technology that examines hundreds of thousands of positions. Investigative genetic genealogy (IGG) bridges both fields by uploading SNP data to genealogical databases for suspect identification.

DTC testing vs. whole-genome sequencing. Standard genealogical tests examine 600,000–900,000 SNPs. Whole-genome sequencing (WGS) services, such as those offered by Nebula Genomics or Dante Labs, sequence all 3.2 billion base pairs. WGS produces data compatible with genealogical analysis but at higher cost and with different privacy considerations.

These distinctions matter for research integrity. A comprehensive family history project, as described on the Genealogy Authority homepage, typically integrates DNA results with vital records, census data, and other documentary evidence.

Tradeoffs and Tensions

Privacy vs. discovery. Participating in DNA testing requires sharing genetic data with a commercial entity and, optionally, with a broader matching community. The 2019 policy change by GEDmatch — which shifted from an opt-out to an opt-in model for law enforcement access after the Golden State Killer case — illustrates the tension between privacy rights and the investigative value of genetic databases. Consumers must navigate differing company policies regarding data retention, third-party sharing, and law enforcement cooperation.

Ethnicity estimate precision vs. consumer expectations. Ethnicity estimates are statistical probabilities, not deterministic findings. Results vary between companies because each uses different reference panels and algorithms. An individual tested on two platforms may receive noticeably different regional percentages. This creates frustration when consumers interpret ethnicity estimates as definitive rather than probabilistic.

Endogamy and genetic genealogy. Populations with high rates of endogamy — including Ashkenazi Jewish communities, French-Canadian populations, and certain island populations — present analytical challenges. Endogamous groups share elevated amounts of background DNA, inflating predicted relationship estimates. A match showing 120 cM of shared DNA might indicate a second cousin in an outbred population but a fourth cousin or more distant relation in an endogamous one. Specialized approaches, such as those discussed in Jewish Genealogy Research, are required for accurate interpretation.

Consent across generations. A DNA test reveals genetic information not only about the tested individual but about biological relatives who did not consent to testing. This cross-generational disclosure raises ethical questions that remain unresolved in both regulatory and professional genealogical contexts.

Common Misconceptions

"DNA testing replaces paper research." DNA results without documentary context are genealogically ambiguous. A 350 cM match could represent a second cousin, a half first cousin once removed, or other relationship permutations. Resolving ambiguity requires correlation with genealogical records, family trees, and source citation.

"Ethnicity percentages identify specific countries or tribes." Ethnicity estimates reflect statistical similarity to reference populations grouped by broad regions. They cannot identify a specific tribe, village, or nation-state of origin. The categories themselves are defined by the testing company and are subject to revision as reference panels expand.

"Y-DNA and mtDNA tests reveal the full ancestral picture." Y-DNA traces only the direct paternal line (father's father's father, etc.) and mtDNA traces only the direct maternal line (mother's mother's mother, etc.). In the tenth generation back, an individual has up to 1,024 ancestors; Y-DNA and mtDNA each represent only one of those 1,024 lineages. Autosomal DNA captures a broader — but still incomplete — sample of ancestry.

"All shared DNA indicates a genealogical relationship." Segments below approximately 7 cM are frequently identical-by-state (IBS) — matching by chance due to population-level allele frequencies — rather than identical-by-descent (IBD) from a common ancestor. Testing platforms apply minimum thresholds to reduce false matches, but small-segment matches still require caution, especially when resolving conflicting evidence.

"Testing companies permanently own DNA samples." Major providers — AncestryDNA, 23andMe, and FamilyTreeDNA — each publish policies allowing consumers to request destruction of physical samples and deletion of genetic data. However, the timelines and processes differ by company, and previously shared or de-identified data may remain in aggregate research datasets.

Checklist or Steps (Non-Advisory)

The following sequence describes the standard process for integrating DNA testing into genealogical research:

  1. Identify the genealogical question. Define whether the research goal involves confirming a specific biological relationship, identifying unknown parentage, breaking through a brick wall in documentary research, or tracing deep maternal/paternal lineage.
  2. Select the appropriate test type. Autosomal DNA addresses broad relationship identification within approximately five to seven generations. Y-DNA addresses direct paternal lineage. mtDNA addresses direct maternal lineage.
  3. Choose a testing provider. Selection factors include database size, geographic coverage of the reference panel, availability of chromosome browsers, and data portability (ability to download raw data).
  4. Collect and submit a sample. Follow the kit instructions for saliva collection or cheek swab. Results typically arrive in four to eight weeks depending on the provider and laboratory load.
  5. Download raw DNA data. Most providers allow export of raw genotype data in a standard file format compatible with third-party analysis tools and alternate databases such as GEDmatch.
  6. Review match lists and shared cM values. Organize matches by predicted relationship category and total shared DNA, noting the largest segment size for each match.
  7. Correlate DNA matches with documentary evidence. Cross-reference match trees, shared surnames, and geographic origins against existing research. Tools such as family group sheets and pedigree charts support this correlation.
  8. Document findings with proper source citation. Record the test provider, kit number, date of results, and shared cM data as part of the evidence trail per source citation standards.

Reference Table or Matrix

Test Type Inheritance Pattern Segments Analyzed Genealogical Range Applicable Lineages Key Limitation
Autosomal DNA (atDNA) Both parents; recombines each generation 600,000–900,000 SNPs ~5–7 generations reliably All ancestral lines Signal dilutes by ~50% per generation
Y-chromosome DNA (Y-DNA) Father to son only; no recombination (except STRs) 400–700 STR markers or 10–25 million positions (Big Y) Deep paternal lineage (thousands of years) Strict paternal line only Available only to biological males
Mitochondrial DNA (mtDNA) Mother to all children; no recombination ~16,569 base pairs (full mitogenome) Deep maternal lineage (thousands of years) Strict maternal line only Mutates slowly; limited recent resolution
Whole-Genome Sequencing (WGS) Both parents; full genome ~3.2 billion base pairs Comparable to atDNA for matching; deeper for rare variants All ancestral lines Higher cost; fewer matching databases
Major DTC Provider Database Size (approx.) Test Type Offered Chromosome Browser Raw Data Download Law Enforcement Policy
AncestryDNA 25+ million atDNA No Yes Requires valid legal process
23andMe 14+ million atDNA (+health option) Yes Yes Requires valid legal process
FamilyTreeDNA 2+ million atDNA, Y-DNA, mtDNA Yes Yes Opt-in for law enforcement matching
MyHeritage DNA 7+ million atDNA Yes Yes Requires valid legal process

Professionals in the genealogical research sector, including those listed through genealogical societies and professional organizations, routinely incorporate DNA evidence alongside traditional record analysis. Those considering professional assistance with DNA interpretation can reference guidance on hiring a professional genealogist.

References

📜 3 regulatory citations referenced  ·  🔍 Monitored by ANA Regulatory Watch  ·  View update log

Explore This Site