Autosomal DNA in Genealogy: Matching Relatives Across Family Lines
Autosomal DNA testing has become the workhorse of genetic genealogy — the test most people take first, the one that surfaces thousands of potential relatives, and the one that generates the most questions about what those results actually mean. This page covers how autosomal DNA works as a genealogical tool, why shared centimorgans translate so imprecisely into relationship predictions, where the method excels and where it quietly fails, and how to interpret matches without being misled by the statistics.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps
- Reference table or matrix
Definition and scope
Autosomal DNA refers to the genetic material contained on the 22 pairs of non-sex chromosomes — the ones numbered 1 through 22. Every person inherits one copy of each autosomal chromosome from each biological parent, which means autosomal DNA captures inheritance from all branches of the family tree simultaneously, not just the direct paternal or maternal line. This breadth is what makes it so useful for genealogy and so complicated to interpret.
Autosomal DNA testing for genealogy is offered by AncestryDNA, 23andMe, MyHeritage DNA, and FamilyTreeDNA, all of which genotype roughly 600,000 to 700,000 single-nucleotide polymorphisms (SNPs) across the genome. The test identifies segments of DNA shared between two individuals, quantifies those segments in centimorgans (cM), and generates a list of genetic matches. A full database comparison can return thousands of matches for a single individual — AncestryDNA alone reported more than 20 million customers as of 2020 (AncestryDNA via STAT News, 2020), meaning match lists for most testers extend well into the hundreds or thousands of names.
The scope of the test extends theoretically to about 5th or 6th cousins — relationships where shared DNA segments still appear above the noise floor of false positives. Beyond that range, shared segments become so small and infrequent that they are genealogically unreliable. The Shared cM Project, maintained by Blaine Bettinger and aggregated from more than 25,000 submitted data points, provides the most widely used empirical reference for expected cM ranges by relationship.
Core mechanics or structure
When an egg and sperm form during meiosis, each parent's paired chromosomes undergo a process called recombination — sections of homologous chromosomes cross over and exchange material before the chromosome is packaged into the gamete. The child does not receive an intact copy of any single grandparent's chromosome; instead, each chromosome is a mosaic of segments inherited from both of that parent's parents.
The average total autosomal genome is approximately 3,400 centimorgans. A child inherits 50% of their DNA from each parent, roughly 25% from each grandparent, and approximately 12.5% from each great-grandparent. The key word is "approximately" — recombination introduces statistical variance, so the actual amount inherited from any one ancestor deviates from the theoretical average. Two siblings who share the same two parents will share on average 50% of their DNA with each other, but the actual range runs from roughly 33% to 54% (International Society of Genetic Genealogy Wiki, ISOGG).
When two people compare their autosomal DNA, the testing platform identifies identical-by-descent (IBD) segments — regions where both individuals carry the same sequence because they inherited it from a common ancestor. These IBD segments are counted and summed in centimorgans to produce a total shared DNA figure. A match of 1,800 cM points toward a close relationship (parent, child, or sibling); a match of 25 cM might be a 3rd cousin, a half-great-uncle, or several other configurations entirely.
Chromosome browsers — available at 23andMe, MyHeritage, and FamilyTreeDNA, though notably absent from AncestryDNA — display the specific chromosomes and positions where matches occur, enabling segment triangulation.
Causal relationships or drivers
The amount of DNA shared between two relatives is driven by three factors: genealogical distance, the number of meioses separating them from the common ancestor, and the random variance introduced by each recombination event.
Endogamy — the pattern of marrying within a defined community — inflates shared cM values dramatically. In communities with sustained endogamy, such as Ashkenazi Jewish, Colonial New England, and some island populations, two individuals who are 4th cousins on paper may share DNA amounts consistent with a 2nd cousin relationship because they share ancestry through multiple pathways simultaneously. ISOGG's documentation on endogamy effects notes that this can make relationship estimation from cM values nearly impossible without supplementary genealogical records (ISOGG Wiki, Endogamy).
Pedigree collapse — when ancestors appear more than once in a family tree because of cousin marriages across generations — produces a similar inflation effect. Colonial American and isolated rural communities frequently exhibit pedigree collapse, which surfaces as unexpectedly high cM values between testers who assumed a more distant relationship.
At the far end of the spectrum, segments smaller than 7 cM are increasingly considered unreliable as evidence of recent common ancestry because they may represent identical-by-state (IBS) DNA — identical by chance rather than by descent. The threshold varies slightly by platform and by researcher, but the genetic genealogy community broadly treats sub-7 cM segments with caution when working in non-endogamous populations.
Classification boundaries
Autosomal DNA matches fall into a few broadly recognized tiers based on shared cM values. The ranges below reflect the empirical distribution from the Shared cM Project v4.0, which is the standard reference:
- Close family (1,700–3,900 cM): Parent, child, full sibling, or a handful of other configurations. Highly reliable at this range.
- Close-to-medium relatives (900–1,700 cM): Half-siblings, grandparents, aunts/uncles, double first cousins, and 1st cousins all overlap in this space. The ranges intersect substantially.
- 1st through 2nd cousins (400–1,200 cM): First cousins average around 850 cM; second cousins average around 230 cM. The distributions overlap enough that software predictions list multiple possible relationships.
- Distant relatives (under 200 cM): Third cousins and beyond. At 100 cM and below, the number of possible relationships expands sharply. A 60 cM match could be a 3rd cousin, 3rd cousin once removed, half-3rd cousin, 4th cousin — the list is genuinely long.
The ISOGG Wiki's autosomal DNA statistics page and Blaine Bettinger's Shared cM Project are the standard references for these ranges.
Tradeoffs and tensions
Autosomal DNA's primary tension in genealogy is the gap between genetic relationships and genealogical relationships. The test measures biology. A legally adopted individual who was raised as a biological child will produce autosomal results that reflect their birth family, not their legal family — a distinction that has surfaced unexpected non-paternity events and unknown-parentage situations at significant emotional cost. Research on adoptees using DNA testing, documented through groups like DNA Detectives and the work of researcher CeCe Moore, has shown that these surprises are not rare edge cases. The topic is treated in depth at unknown parentage research.
A second tension concerns database coverage. The utility of autosomal testing is almost entirely dependent on whether the right relatives have also tested. A researcher whose extended family has not tested at any of the major platforms will find a list of distant cousins with no paper trail connecting them, which is considerably less useful than a confirmed 2nd cousin match with a documented tree. Platform choice matters: as of 2023, AncestryDNA's database is the largest, which statistically improves the probability of finding useful matches, but its lack of a chromosome browser limits analytical depth compared to smaller databases where the tooling is more robust.
Privacy and law enforcement use of databases represent a third tension. The use of consumer genealogy databases in criminal investigations — particularly through the technique of investigative genetic genealogy — has introduced concerns about whether uploading DNA to public matching databases constitutes consent to uses beyond genealogical research. GEDmatch's 2019 policy change to opt-in for law enforcement access, followed by its acquisition by Verogen, marked a visible point of institutional conflict between open database principles and privacy expectations.
Common misconceptions
Misconception: A DNA match is proof of a genealogical relationship.
A shared segment indicates probable IBD inheritance, but it requires documentary corroboration. Triangulating a segment across three or more testers who share a known common ancestor strengthens the inference considerably; a single match without a paper trail is a hypothesis, not a conclusion. The Genealogical Proof Standard, maintained by the Board for Certification of Genealogists, applies to DNA evidence as it does to any other source.
Misconception: Ethnicity percentages are the same thing as DNA matching.
Ethnicity estimates and match lists are derived from the same raw genotype data but through entirely different computational processes. Ethnicity estimates compare an individual's SNP profile against reference populations assembled by each company — and those reference populations differ between companies, which is why the same person gets different ethnicity percentages from AncestryDNA and 23andMe. More information on this distinction appears at DNA ethnicity estimates. Match lists, by contrast, directly compare two individuals' SNP profiles for shared segments.
Misconception: If two people share 25% DNA, they must be grandparent-grandchild or aunt/uncle-niece/nephew.
25% is the theoretical average for those relationships, but half-siblings also average around 25%, and some unusual double-cousin configurations land in similar ranges. Always run cM values through the Shared cM Project tool and consult multiple possible relationship configurations before drawing conclusions.
Misconception: Autosomal DNA covers all ancestors equally.
With each generation, ancestors contribute a progressively smaller and more variable slice of the genome. Beyond 5 generations back, some ancestors may contribute no detectable DNA at all to a given descendant. A 6th-great-grandparent has approximately a 50% chance of contributing zero measurable autosomal DNA to a specific living descendant (ISOGG Wiki, Cousin statistics). Y-DNA and mitochondrial DNA testing address the specific paternal and maternal lines that autosomal testing can miss; more on those approaches appears at Y-DNA genealogy and mitochondrial DNA genealogy.
Checklist or steps
The following sequence describes the standard analytical workflow applied to autosomal DNA results in genealogical research. This is a descriptive account of practice, not a prescription.
Working through an autosomal match list:
- Record total shared cM for each match of interest, along with the number of shared segments and the length of the longest segment.
- Enter shared cM into the Shared cM Project tool at DNA Painter to generate a probability distribution across possible relationship types.
- Review the match's attached family tree, if present, for surnames, locations, and time periods that overlap with the researcher's known tree.
- Identify shared matches — individuals who appear in both the researcher's match list and the subject match's match list — to help identify which branch the match likely comes from.
- Cluster shared matches using a clustering method (such as the Leeds Method, developed by Dana Leeds) to sort matches into probable family groups, each representing a different ancestral line.
- Build out or request access to trees for top shared matches within the same cluster to identify the most recent common ancestor.
- If a chromosome browser is available, compare segments across multiple matches within the same cluster to triangulate specific chromosome positions to a specific ancestral couple.
- Document the analysis using the source citation standards described at citing genealogical sources, noting the testing platform, the test date, and the analytical method used to assign the match to a family line.
- Distinguish working hypotheses from confirmed conclusions — a match assigned to a probable line through clustering is a hypothesis until corroborated by documentary evidence.
- Repeat for adjacent matches within confirmed clusters to extend the reconstruction of that ancestral line.
A broader introduction to genetic testing methods and platform selection appears at DNA testing for genealogy. For researchers working through the full scope of genealogy methods that complement DNA analysis, genealogy research methods covers the documentary side of the equation. The full landscape of genealogical research dimensions is covered at genealogyauthority.com.
Reference table or matrix
Autosomal DNA: Expected shared cM by relationship
| Relationship | Theoretical % shared | Average cM (Shared cM Project v4.0) | Typical range (cM) | Overlapping relationships |
|---|---|---|---|---|
| Parent / Child | 50% | ~3,400 cM | 3,330–3,720 | None — fully reliable |
| Full sibling | ~50% | ~2,550 cM | 2,209–3,384 | Half-sibling (lower end) |
| Grandparent / Grandchild | 25% | ~1,765 cM | 1,156–2,311 | Half-sibling, aunt/uncle |
| Aunt / Uncle – Niece / Nephew | 25% | ~1,750 cM | 1,349–2,175 | Grandparent, half-sibling |
| Half-sibling | 25% | ~1,759 cM | 1,160–2,650 | Grandparent, aunt/uncle |
| 1st cousin | 12.5% | ~850 cM | 553–1,225 | Half-aunt/uncle, great-grandparent |
| 1st cousin once removed | ~6.25% | ~425 cM | 141–851 | 2nd cousin, half-1st cousin |
| 2nd cousin | ~3.13% | ~229 cM | 41–592 | 1st cousin once removed |
| 2nd cousin once removed | ~1.56% | ~122 cM | 0–316 | 3rd cousin, half-2nd cousin |
| 3rd cousin | ~0.78% | ~74 cM | 0–173 | Multiple distant configurations |
| 4th cousin | ~0.20% | ~35 cM | 0–139 | Nearly indistinguishable at this range |
Source: Blaine Bettinger, The Shared cM Project 4.0, aggregated from 25,000+ submitted data points. Ranges represent the 99th percentile of observed values.
Notes on endogamy adjustments: In endogamous populations, observed cM values frequently exceed the typical range by a factor of 1.5 to 2.5. Relationship predictions based on cM alone should be treated as unreliable without supplementary documentary evidence in these populations. The ISOGG Wiki's endogamy article is the standard reference for this effect.
References
- ISOGG Wiki — Autosomal DNA Statistics
- ISOGG Wiki — Endogamy
- ISOGG Wiki — Cousin Statistics
- The Shared cM Project 4.0 — DNA Painter (Blaine Bettinger)
- Board for Certification of Genealogists — Genealogical Proof Standard
- International Society of Genetic Genealogy — ISOGG Wiki Main Page
- AncestryDNA via STAT News — 20 million customers (2020)