Skip to contents

See the pkgdown documentation site here and the package paper in the Journal Of Open Softwares.

Biological studies, especially in ecology, health sciences and taxonomy, need to describe the biological composition of samples. During the last twenty years, (i) the development of DNA sequencing, (ii) reference databases, (iii) high-throughput sequencing (HTS), and (iv) bioinformatics resources have allowed the description of biological communities through metabarcoding. Metabarcoding involves the sequencing of millions (meta-) of short regions of specific DNA (-barcoding, Valentini, Pompanon, and Taberlet (2009)) often from environmental samples (eDNA, Taberlet et al. (2012)) such as human stomach contents, lake water, soil and air.

MiscMetabar aims to facilitate the description, transformation, exploration and reproducibility of metabarcoding analysis using R. The development of MiscMetabar relies heavily on the R packages dada2 (Callahan et al. 2016), phyloseq (McMurdie and Holmes 2013) and targets (Landau 2021).

Installation

There is no CRAN version of MiscMetabar for now (work in progress). As MiscMetabar heavily relies on two bioconductor packages (dada and phyloseq), we need to first install those 2 packages using BiocManager.

You can install the stable version from GitHub with:

if (!require("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("adrientaudiere/MiscMetabar")

You can install the development version from GitHub with:

if (!require("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("adrientaudiere/MiscMetabar", ref = "dev")

Some use of MiscMetabar

See articles in the MiscMetabar website for more examples.

For an introduction to metabarcoding in R, Please visite the state of the field articles. The import, export and track article explains how import and export phyloseq object. Its also show how to summarize useful information (number of sequences, samples and clusters) accross bioinformatic pipelines. The article explore data takes a closer look to different way of explore samples and taxonomical data from phyloseq object.

If you are interested in ecological metrics, see the articles describing alpha-diversity and beta-diversity analysis. The article filter taxa and samples describes some data-filtering processes using MiscMetabar and the reclustering tutorial introduces the different way of clustering already-clustered OTU/ASV. The article tengeler explore the dataset from Tengeler et al. (2020) using some MiscMetabar functions.

For developers, I also wrote a article describing som rules of codes.

Summarize a physeq object

Alpha-diversity analysis

p <- MiscMetabar::hill_pq(data_fungi, variable = "Height")
p$plot_Hill_0
Hill number 1

Hill number 1

p$plot_tuckey
Result of the Tuckey post-hoc test

Result of the Tuckey post-hoc test

Beta-diversity analysis

if (!require("ggVennDiagram", quietly = TRUE)) {
  install.packages("ggVennDiagramà")
}
ggvenn_pq(data_fungi, fact = "Height") +
  ggplot2::scale_fill_distiller(palette = "BuPu", direction = 1) +
  labs(title = "Share number of ASV among Height in tree")

Note for non-linux users

Some functions may not work on windows (e.g. [track_wflow()], [cutadapt_remove_primers()], [krona()], [vsearch_clustering()], …). A solution is to exploit docker container, for example the using the great rocker project.

Here is a list of functions with some limitations or not working at all on windows OS:

  • [build_phytree_pq()]
  • [count_seq()]
  • [cutadapt_remove_primers()]
  • [krona()]
  • [merge_krona()]
  • [multipatt_pq()]
  • [plot_tsne_pq()]
  • [rotl_pq()]
  • [save_pq()]
  • [tax_datatable()]
  • [track_wkflow()]
  • [track_wkflow_samples()]
  • [tsne_pq()]
  • [venn_pq()]

MiscMetabar is developed under Linux and the vast majority of functions may works on Unix system, but its functionning is not test under iOS.

Installation of other softwares for debian Linux distributions

If you encounter any errors or have any questions about the installation of these softwares, please visit their dedicated websites.

blast+

sudo apt-get install ncbi-blast+

vsearch

sudo apt-get install vsearch

An other possibilities is to install vsearch with conda.

swarm

git clone https://github.com/torognes/swarm.git
cd swarm/
make

An other possibilities is to install swarm with conda.

Mumu

git clone https://github.com/frederic-mahe/mumu.git
cd ./mumu/
make
make check
make install  # as root or sudo

cutadapt

conda create -n cutadaptenv cutadapt
Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. 2016. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13 (7): 581–83. https://doi.org/10.1038/nmeth.3869.
Landau, William Michael. 2021. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.
McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An r Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4): e61217. https://doi.org/10.1371/journal.pone.0061217.
Taberlet, Pierre, Eric Coissac, Mehrdad Hajibabaei, and Loren H Rieseberg. 2012. “Environmental Dna.” Molecular Ecology. Wiley Online Library. https://doi.org/10.1002/(issn)2637-4943.
Valentini, Alice, François Pompanon, and Pierre Taberlet. 2009. “DNA Barcoding for Ecologists.” Trends in Ecology & Evolution 24 (2): 110–17. https://doi.org/10.1016/j.tree.2008.09.011.