The value of genomic analysis

Genetic heritability is responsible for 30% of individual health outcomes, but is hardly used to guide disease prevention and care. Each individual carries 4-5 million genetic variants, each with varying influence on traits related to our health. The cost to sequence a genome has reduced drastically in recent years, and sequence data shows potential for ubiquitous use. However, the ability to read the sequence accurately and to meaningfully interpret it remain obstacles to broad adoption.

Sisters assembling a puzzle

Improving the accuracy of genomic analysis

Sequencing genomes enables us to identify variants in a person’s DNA that indicate genetic disorders such as an elevated risk for breast cancer.

Highly accurate genomes with deep neural networks

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. As published in Nature Biotechnology, DeepVariant, an open-source variant caller that uses a deep neural network to call genetic variants from next-generation DNA sequencing data, significantly improves the accuracy in identifying variant locations, reducing the error rate by more than 50%. Learn more

Winner in PrecisionFDA V2 Truth Challenge

DeepVariant won awards for Best Overall accuracy in 3 of 4 instrument categories in the PrecisionFDA V2 Truth Challenge. Compared to previous state-of-the-art models, DeepVariant v1.0 significantly reduces the errors for widely-used sequencing data types, including Illumina and Pacific Biosciences. Read the article

Blurry image of genetic sequence

Identifying disease-causing variants in cancer patients

Researchers wanted to understand if incorporating automated deep learning technology would improve the detection of disease-causing variants in patients with cancer. In a cross-sectional study published in JAMA of 2,367 prostate cancer and melanoma patients in the US and Europe, DeepVariant found disease-causing variants in 14% more individuals than prior state-of-the-art methods.

Building large-scale cohorts for genetic discovery research

Large cohorts of sequenced individuals are the foundations for discovery of novel genetic associations with disease. We developed best practices for generating cohorts that substantially improves over previous methods, which has been adopted by the UK Biobank for its large-scale sequencing efforts. Read the article

Improving genetic association discovery with machine learning

Discovering genetic variants associated with a trait of interest requires a large cohort of individuals with both genetic and trait information. As published in AJHG, we demonstrate that using a machine learning model to predict eye-disease-related traits from fundus images significantly improves discovery of genetic variants influencing those traits.

Our partners in genomics research

Because genomic data is highly personal, to the greatest extent possible we use datasets that are fully public or are broadly available to qualified researchers. We also partner with trusted organizations that contribute scientific and technology development to improve standards in genomic analysis and enhance the utility of sequencing data.

Pacific Biosciences logo

DeepVariant’s precisionFDA Truth Challenge V2 submission using PacBio HiFi reads achieved the highest single-technology accuracy, which has been featured on the PacBio blog and in a Nature Biotechnology retrospective. The collaboration also successfully launched DeepConsensus, which improves HiFi yield and read quality compared to existing consensus basecalling methods.

Logo for Reneneron

The Regeneron Genetics Center, one of the world’s largest human genomic research efforts, has adopted DeepVariant and re-trained specialized models for both internal projects and the delivery of 200,000 exomes to UKBiobank.

Logo for University of California Santa Cruz Genomics Institute

Benedict Paten’s lab at UC Santa Cruz collaborated with Google on PEPPER-DeepVariant, which won best accuracy in the Oxford Nanopore Technologies category of the PrecisionFDA. The paper was also published in Nature Methods.

Logo for NVIDIA

NVIDIA Clara Parabricks Pipelines software provides a suite of accelerated bioinformatic tools to support DNA and RNA applications, running on a GPU. Their implementation of DeepVariant processes a 30x whole human genome in less than 25 minutes from fastq to vcf using their latest A100 GPU.

Logo for GenapSys

GenapSys trained a custom DeepVariant model to provide a highly accurate variant caller for their new high accuracy, low cost, benchtop sequencing instrument.

Logo for GenapSys

ATGenomix builds a Spark framework which efficiently parallelizes DeepVariant, for their work with several clinical partners.

Logo for DNAnexus

DNAnexus provides a secure and collaborative fit-for-purpose bioinformatics system that integrates cutting-edge tools like DeepVariant. They work with industry leaders like Google, the FDA, and UK Biobank to provide solutions to the scientific community.

Logo for DNAstack

DNAstack enables researchers to organize, share, and analyze genomics and biomedical data, using tools like DeepVariant, in an easy to use cloud environment. DNAstack's software products use open standards developed by the Global Alliance for Genomics & Health.