DADA2

Introduction

What is DADA2?

DADA2 stands for the second iteration of the Divisive Amplicon Denoising Algorithm (DADA2) and was developed by Benjamin Callahan et al. in 2016. To quote from the DADA2 manuscript 1:

DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide.

Why use DADA2?

You are interested in assigning high resolution taxonomy down to the species level of your microbial 16S amplicon data generated Illumina sequencing.

How does DADA2 work?

Again, quoting from the DADA2 manuscript 1:

The DADA2 R package implements a complete pipeline to turn paired-end fastq files from the sequencer into merged, denoised, chimera-free, inferred sample sequences. Parts of this pipeline can be substituted with outside methods, but there are some structural differences between the DADA2 pipeline and most others. One such difference is that the DADA2 pipeline performs merging of paired-end reads after denoising. This is because the core denoising algorithm uses the empirical relationship between the quality score and the error rates. When reads are merged, this relationship will differ between the forward-only, overlapping, and reverse-only portions of the merged read. That variation interferes with the denoising algorithm, and therefore greater accuracy can be achieved by denoising before merging, albeit at some computational cost.

In other words, DADA2 uses its error model to efficiently analyze microbial 16S sequences and using its algorithm to accurately assign taxonomy while maintain low false positives. Read here for a more technical description of how the DADA2 algorithm works.

What’s needed to use DADA2?

  • Technical
  • Knowledge
    • Basic familiarity with R. If you need to freshen up on your R skills, check out the R Basics page (coming soon)

    • Basic understanding of microbial bioinformatics

  • Data
    • Samples have been demultiplexed, i.e. split into individual per-sample fastq files

    • Non-biological nucleotides have been removed, e.g. primers, adapters, linkers, etc.

    • If paired-end sequencing data, the forward and reverse fastq files contain reads in matched order

Tutorials

Below are some of our favorite DADA2 with a brief description of what makes them standout:

  1. The official DADA2 tutorial

    • This is a step-by-step walkthrough written by the developer of DADA2. You can follow along with this tutorial on your local computer or on a server. There is also a big data tutorial for instances where you have massive datasets that require powerful computational resources.

    • Highly recommended for first-time users.

  2. I-Hsuan Lin’s tutorial series on 16S rDNA V3-V4 amplicon sequencing analysis

    • This is part one of a series of tutorials where Lin walks you through from downloading 16S data to using picrust2 and ALDEx2.

    • Part one includes how to cut PCR primer sequences or adapters from your sequences using cutadapt and then how to process these sequences in DADA2 to generate a phylogenetic tree and a phyloseq object. Much of the DADA2 portion of the tutorial will be repetitive of DADA2’s official tutorial, but with minor exceptions. Namely, Lin’s tutorial assumes basic familiarity with DADA2 and doesn’t include extraneous information found in DADA2’s tutorial which might not be relevant to more seasoned DADA2 users.

    • We recommend you try this tutorial after you’ve successfully completed DADA2’s tutorial.

What next?

Alternatives

DADA2 is not the only software that analyzes 16S sequences. The following is a list of alternatives used many microbial bioinformaticians: