Phyloseq

Introduction

What is Phyloseq?

Phyloseq is an R package for reproducible interactive analysis and graphics of microbiome census data and was developed by Paul McMurdie and Susan Holmes in 2013. Quoting from the Phyloseq website:

The phyloseq package is a tool to import, store, analyze, and graphically display complex phylogenetic sequencing data that has already been clustered into Operational Taxonomic Units (OTUs), especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs. This package leverages many of the tools available in R for ecology and phylogenetic analysis (vegan, ade4, ape, picante), while also using advanced/flexible graphic systems (ggplot2) to easily produce publication-quality graphics of complex phylogenetic data. phyloseq uses a specialized system of S4 classes to store all related phylogenetic sequencing data as single experiment-level object, making it easier to share data and reproduce analyses. In general, phyloseq seeks to facilitate the use of R for efficient interactive and reproducible analysis of OTU-clustered high-throughput phylogenetic sequencing data.

Why use Phyloseq?

You’ve generated OTU/ASV data and you want to analyze and create publication quality figures.

How does Phyloseq work?

Quoting from the Phyloseq paper 1:

The phyloseq package provides an object-oriented programming infrastructure that simplifies many of the common data management and preprocessing tasks required during analysis of phylogenetic sequencing data. This simplified syntax helps mitigate inconsistency errors and encourages interaction with the data during preprocessing. The phyloseq package also provides a set of powerful analysis and graphics functions, building upon related packages available in R and Bioconductor. It includes or supports some of the most commonly-needed ecology and phylogenetic tools, including a consistent interface for calculating ecological distances and performing dimensional reduction (ordination). The graphics functions allow users to interactively produce annotated publication-quality graphics in just one or two lines of code.

In other words, Phyloseq simplifies the process for analyzing and visualizing microbial bioinformatic data.

What’s needed to use Phyloseq?

  • Technical
  • Knowledge
    • Basic familiarity with R. If you need to freshen up on your R skills, check out the R Basics page (coming soon)

    • Basic understanding of microbial bioinformatics

    • Basic understanding of statistical tests

  • Data
    • Import abundance and related data from popular Denoising / OTU-clustering pipelines: (DADA2, UPARSE, QIIME, mothur, BIOM, PyroTagger, RDP, etc.)
      • Sample metadata

      • Taxanomic classification of microbial data

      • OTU/ASV counts

      • Phylogenomic data (optional)

Tutorials

Below are some of our favorite Phyloseq with a brief description of what makes them standout:

  1. The official Phyloseq tutorial

    • This is a step-by-step walkthrough written by the developer of Phyloseq. There are additional tutorials for different visualization methods.
      • Frequently Asked Questions: Answers common questions such as “How can I modify the plots?” and “How should I normalize my data?”

    • Highly recommended for first-time users.

  2. Vaulot Phyloseq Tutorial

    • This is a good tutorial for beginners and demonstrates some slightly different ways of using phyloseq than the official package, but it is very similar. It does not go into as much depth about what each of the function options do, so for that reason we recommend you try this tutorial after you’ve successfully completed Phyloseq’s tutorial.

  3. Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

    • This is a much more in-depth tutorial starting with importing data from DADA2 and proceeding through statistical analyses in Phyloseq. From the paper:

      In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

    • We recommend this tutorial after you’ve feel comfortable with the basics of phyloseq.

What next?

  • Differential abundance analysis (coming soon…)

  • Effect size analysis (coming soon…)

  • Phylogenomic analysis (coming soon…)

Alternatives

None that we are aware of that is analogous to Phyloseq.