Health

DNA Health Report: How to Generate Genetic Health Insights From Your Raw DNA Data

TL;DR

  • Your raw DNA file from Ancestry or 23andMe contains ~700,000 SNP data points that most people never look at beyond ethnicity estimates
  • The DNA Health Report tool analyses your genome against three major databases - PharmGKB, ClinVar, and a curated SNP database - to generate actionable health insights
  • PharmGKB (Stanford University) maps how your genes affect drug metabolism, while ClinVar (NCBI/NIH) catalogues disease-associated genetic variants
  • The tool runs entirely on your own computer inside Docker containers - your DNA data never leaves your machine
  • Reports cover drug interactions, disease risk and carrier status, methylation, neurotransmitters, fitness genetics, nutrition, and personalised supplement and lifestyle protocols

The Problem: You Have Your DNA Data, Now What?

If you’ve done a DNA test through Ancestry.com.au or 23andMe, you’ve probably explored your ethnicity breakdown, maybe connected with some distant relatives, and then… that was it. But sitting inside that raw data file is a wealth of health-related genetic information that those platforms barely scratch the surface of.

Your raw DNA file contains roughly 700,000 data points called SNPs (Single Nucleotide Polymorphisms - pronounced “snips”). Each one represents a specific position in your genome where your DNA differs from a reference sequence. These tiny variations influence everything from how you metabolise caffeine and medications, to your risk for certain diseases, to whether you’re a natural sprinter or an endurance runner.

The DNA Health Report is a tool I built that takes that raw file and cross-references it against three major genetic databases to produce a comprehensive, personalised health report - covering drug interactions, disease risk, carrier status, nutrition, fitness, neurotransmitters, and actionable protocols.

And critically, it all runs locally on your own computer. Your DNA data never touches the internet.

A Quick Primer: What Are Genes and SNPs?

Before diving into the tool, let’s cover some basics.

DNA, Genes, and the Genome

Your DNA (deoxyribonucleic acid) is the instruction manual for building and running your body. It’s a long molecule made up of four chemical bases - adenine (A), thymine (T), guanine (G), and cytosine (C) - arranged in pairs along a double helix. You have about 3 billion of these base pairs, and together they make up your genome.

A gene is a specific section of DNA that contains the instructions for making a particular protein. Proteins do most of the work in your cells - they’re enzymes that digest your food, receptors that respond to hormones, structural components of your muscles, and much more. You have roughly 20,000 genes, but they only account for about 1-2% of your total DNA.

What Are SNPs?

A SNP (Single Nucleotide Polymorphism) is a position in the genome where people commonly differ by a single letter. For example, at a specific position on chromosome 15, most people might have a C, but some people have a T instead. These single-letter differences are the most common type of genetic variation between humans.

Most SNPs have no noticeable effect. But some fall within or near genes and can change how those genes work. A SNP in the CYP2D6 gene, for instance, can make the enzyme it produces work faster or slower - directly affecting how you break down certain medications.

Each SNP is identified by an rs number (like rs4680 or rs1801133). These are universal identifiers from the dbSNP database maintained by the US National Center for Biotechnology Information (NCBI). When you download your raw data from Ancestry, the file is essentially a long list of rs numbers paired with the two letters (alleles) you carry at each position - one inherited from each parent.

Genotypes and What They Mean

At each SNP position, you have two copies - one from your mother and one from your father. This gives you a genotype with two alleles. For example:

  • AA - You have the same variant on both copies (homozygous)
  • AG - You have one copy of each variant (heterozygous)
  • GG - You have the other variant on both copies (homozygous)

Whether a particular genotype matters depends on the gene and the trait. Some variants only have an effect when you carry two copies (recessive), while others have an effect with just one copy (dominant). Some show a dose-dependent effect - one copy has a moderate impact, two copies have a stronger one.

How the DNA Health Report Works

Step 1: Download Your Raw DNA Data

On Ancestry.com.au, go to your DNA settings and look for the option to download your raw DNA data. You’ll get a text file (typically called something like AncestryDNA.txt) that’s about 15-25MB. It works with 23andMe raw data files as well.

The file looks something like this:

rsid        chromosome  position    allele1  allele2
rs4477212   1           82154       A        A
rs3094315   1           752566      A        G
rs3131972   1           752721      A        G

Each row is one SNP - the rs identifier, which chromosome it’s on, its position, and the two alleles you carry.

Step 2: Run the Tool

The DNA Health Report runs inside Docker containers on your computer. After cloning the repository, you run a script to download the reference databases, place your genome file in the data directory, and start the application:

# Download the reference databases (PharmGKB and ClinVar)
bash scripts/download_data.sh

# Start the application
docker compose up

# Open your browser to http://localhost:3000

The web interface lets you drag and drop your genome file, give it a name, and click “Analyse Genome.” You’ll see a real-time progress tracker as it works through the analysis pipeline.

Step 3: The Analysis Pipeline

Behind the scenes, the tool runs through five stages:

  1. Genome Parsing (0-10%) - Reads your raw file and builds a fast lookup index of all ~700,000 SNPs, indexed by both rs number and chromosome position.

  2. PharmGKB Loading (10-20%) - Loads the drug-gene interaction database and matches your genotypes against known pharmacogenomic variants.

  3. Lifestyle & Health Analysis (20-30%) - Cross-references your genome against roughly 200 curated SNPs covering metabolism, methylation, neurotransmitters, fitness, nutrition, cardiovascular health, sleep, and more. Each finding is scored by magnitude (0-6) based on clinical significance.

  4. ClinVar Disease Scan (30-85%) - This is the big one. It scans your genome against approximately 341,000 variants from ClinVar, checking whether you carry any alleles classified as pathogenic, likely pathogenic, risk factors, or drug response variants. It also identifies protective variants and carrier status for recessive conditions.

  5. Report Generation (85-100%) - Compiles everything into structured reports and a JSON file that powers the interactive dashboard.

Step 4: Read Your Results

The results are presented through a web dashboard with four tabs:

  • Lifestyle & Health - Findings about how your genes affect day-to-day health, filterable by category (methylation, neurotransmitters, nutrition, fitness, etc.) and searchable by gene or SNP
  • Disease Risk - Any pathogenic or likely pathogenic variants found, carrier status for recessive conditions, and risk factors
  • Protocol - Personalised, actionable recommendations for supplements, diet, lifestyle, and monitoring - all derived from your specific genetic findings
  • Downloads - Three detailed markdown reports you can save and share with healthcare providers

The Databases: Where the Knowledge Comes From

The value of this tool comes from the quality of the databases it references. Here’s what each one is and who created it.

PharmGKB - The Pharmacogenomics Knowledge Base

Created by: Stanford University, funded by the US National Institutes of Health (NIH)

Website: pharmgkb.org

What it contains: PharmGKB is one of the world’s leading resources for understanding how genetic variation affects drug response. It’s maintained by a team of scientists and curators at Stanford who systematically review published pharmacogenomic research and distil it into structured, machine-readable annotations.

Each annotation links a specific genetic variant to a drug outcome - things like “people with this CYP2D6 genotype are poor metabolisers of codeine” or “this VKORC1 variant requires lower warfarin doses.” Every annotation is graded by evidence level:

  • Level 1A - Highest confidence. Supported by a Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline with strong evidence
  • Level 1B - Strong evidence from multiple replicated studies
  • Level 2A - Moderate evidence, often with a PharmGKB-curated pathway
  • Level 2B - Moderate evidence from individual studies
  • Level 3-4 - Lower evidence, annotation only

The DNA Health Report uses two key files from PharmGKB: clinical_annotations.tsv (the metadata about each drug-gene interaction) and clinical_ann_alleles.tsv (the specific genotype-to-outcome mappings). It filters to evidence levels 1A through 2B to keep findings clinically relevant.

Practical example: If you carry the rs4244285 variant in CYP2C19 (the *2 allele), PharmGKB will flag that you’re a poor metaboliser of clopidogrel (Plavix), meaning the drug may not work effectively for you. This is the kind of information that could genuinely matter if you’re ever prescribed that medication.

License: Creative Commons Attribution-ShareAlike 4.0

ClinVar - The Clinical Variant Database

Created by: The National Center for Biotechnology Information (NCBI), part of the US National Library of Medicine at the National Institutes of Health (NIH)

Website: ncbi.nlm.nih.gov/clinvar

What it contains: ClinVar is a freely accessible archive of reports describing the relationships between genetic variants and human health conditions. Clinical testing laboratories, research groups, and expert panels around the world submit their interpretations of genetic variants to ClinVar.

The database used by the DNA Health Report contains approximately 341,000 variant entries. Each entry includes:

  • Clinical significance - Is this variant pathogenic (disease-causing), likely pathogenic, benign, likely benign, a risk factor, protective, or of uncertain significance?
  • Associated condition - What disease or trait is it linked to?
  • Review status - Rated with gold stars (0-4) indicating how thoroughly the variant has been reviewed. Four stars means it’s been evaluated by an expert panel; zero stars means a single submitter with no assertion criteria.

The tool scans your genome against ClinVar by matching chromosome positions and checking whether you carry the alternate (non-reference) allele. It then classifies findings based on whether you’re heterozygous (one copy) or homozygous (two copies), and whether the condition follows dominant or recessive inheritance.

Practical example: ClinVar might identify that you carry one copy of a CFTR mutation (the gene associated with cystic fibrosis). Since CF is recessive, one copy makes you a carrier - you don’t have the condition, but you could pass the variant to your children. If both parents are carriers, each child has a 25% chance of being affected.

The Curated SNP Database - Lifestyle and Health Genetics

Created by: Manually curated from peer-reviewed research literature

What it contains: This is a purpose-built database of approximately 200 well-studied SNPs, each with detailed annotations about what the variant does, its clinical mechanism, and actionable recommendations. It covers categories including:

  • Drug Metabolism - CYP1A2, CYP2C19, CYP2C9, CYP2D6, CYP3A5 (the cytochrome P450 enzymes that metabolise most medications)
  • Methylation - MTHFR, MTR, MTRR, CBS, PEMT (the folate cycle and methyl donor pathways critical for hundreds of biochemical reactions)
  • Neurotransmitters - COMT, BDNF, DRD2, OPRM1 (dopamine metabolism, brain plasticity, reward sensitivity, opioid response)
  • Caffeine & Stimulants - CYP1A2, ADORA2A, ADA (how fast you clear caffeine, how sensitive your adenosine receptors are)
  • Sleep & Circadian Rhythm - CLOCK, PER2, ARNTL (your genetic chronotype - morning lark vs night owl)
  • Fitness - ACTN3, ACE, PPARGC1A (fast-twitch vs slow-twitch muscle fibre composition, VO2 max potential)
  • Nutrition - FTO, TCF7L2, FADS1, MCM6/LCT, GC, BCMO1 (obesity risk, diabetes risk, omega-3 conversion, lactose tolerance, vitamin D binding, beta-carotene conversion)
  • Cardiovascular - APOE, Factor V Leiden, AGT, AGTR1 (Alzheimer’s and cholesterol risk, clotting disorders, blood pressure regulation)
  • Inflammation - IL6, TNF (inflammatory response genes relevant to autoimmune conditions and recovery)
  • Iron Metabolism - HFE (hereditary haemochromatosis - iron overload)
  • Alcohol Metabolism - ALDH2, ADH1B (the “Asian flush” gene and alcohol processing speed)

Each SNP entry includes the possible genotypes, a magnitude score (0-6) indicating clinical importance, the biological mechanism, health implications, and specific recommendations. For example, the MTHFR rs1801133 entry explains that the TT genotype reduces enzyme activity by about 70%, impairs folate-to-methylfolate conversion, and recommends supplementing with methylfolate (L-5-MTHF) rather than folic acid.

What a Report Actually Looks Like

The tool generates three markdown reports:

1. Exhaustive Genetic Report

This is the deep dive. It starts with an executive summary showing how many high, moderate, and low-impact findings were identified. Priority findings (magnitude 3 or above) are highlighted first, followed by a pathway analysis that groups findings by biological system. Drug interactions from PharmGKB are listed by evidence level.

2. Disease Risk Report

This covers pathogenic variants, carrier status, risk factors, drug response variants, and protective variants. Findings are grouped by clinical significance and include the ClinVar gold star confidence rating. It distinguishes between variants where you’re likely affected versus those where you’re a carrier.

3. Actionable Health Protocol

This is the practical output - a personalised protocol based on your specific findings. It’s organised into sections:

  • Supplements - Gene-specific recommendations with doses, preferred forms, and reasoning (e.g., “methylfolate 400-800mcg daily” if you have MTHFR variants)
  • Dietary - Food-based recommendations tied to your genetics
  • Lifestyle - Exercise type, stress management, and circadian rhythm suggestions
  • Monitoring - Which blood tests to discuss with your doctor (homocysteine, vitamin D, fasting glucose, etc.)
  • Drug Interactions - Medications to be cautious with based on your metabolism genes
  • Carrier Notes - Implications of carrier status for family planning

Privacy: Why Local Processing Matters

DNA data is arguably the most sensitive personal information that exists. It can’t be changed, it reveals information about your relatives, and it has implications for insurance, employment, and family relationships.

The DNA Health Report is designed to run entirely on your own computer using Docker containers. When you start the application, it spins up a Python backend and a React frontend, both running locally. Your genome file is read from your local filesystem, processed in memory, and the results are written back to local files. No data is sent to any external server during analysis.

This matters because once your genetic data is uploaded to a cloud service, you lose control over it. Even services with good privacy policies can be breached, acquired, or compelled to share data. By keeping everything local, you maintain full control.

Important Caveats

This tool is for educational and informational purposes. It is not a medical diagnostic tool and should not replace professional genetic counselling or medical advice. A few important things to keep in mind:

  • Consumer DNA tests don’t cover everything. Ancestry and 23andMe test roughly 700,000 SNPs out of the 3+ billion base pairs in your genome. Many clinically important variants may not be included in your raw data.
  • Context matters. A single SNP rarely tells the whole story. Gene expression is influenced by multiple variants, epigenetics, environment, diet, and lifestyle. The tool does its best to provide context, but genetics is complex.
  • Database limitations. ClinVar contains variants of uncertain significance, and scientific understanding of many variants is still evolving. A variant classified as “likely pathogenic” today might be reclassified as benign tomorrow as more data accumulates.
  • Discuss findings with a professional. If the report flags something concerning - especially in the disease risk section - talk to a genetic counsellor or your doctor before making any health decisions.

Getting Started

If you want to try it yourself:

  1. Download your raw DNA data from Ancestry.com.au (Settings > DNA > Download Raw DNA Data)
  2. Clone the repository and run the database download script
  3. Start the application with docker compose up
  4. Open http://localhost:3000, upload your file, and run the analysis

The entire process takes a few minutes, and at the end you’ll have a detailed, personalised genetic health report that goes far beyond what the consumer DNA testing platforms show you.

Your DNA data has been sitting there the whole time. You might as well find out what it says.

Comments (...)

Loading comments...