R ecosystem for metabarcoding
Source:vignettes/articles/states_of_fields_in_R.Rmd
states_of_fields_in_R.Rmd
This is a short introduction to other R packages in the field of metabarcoding analysis.
State of the Field in R
The metabarcoding ecosystem in the R language is mature, well-constructed, and relies on a very active community in both the bioconductor and cran projects. The bioconductor even creates specific task views in Metagenomics and Microbiome.
R package dada2
(Callahan et al. 2016) provides a highly
cited and recommended clustering method (Pauvert
et al. 2019). dada2
also provides tools to complete
the metabarcoding analysis pipeline, including chimera detection and
taxonomic assignment. phyloseq
(McMurdie and Holmes 2013) (https://bioconductor.org/packages/release/bioc/html/phyloseq.html)
facilitate metagenomics analysis by providing a way to store data (the
phyloseq
class) and both graphical and statistical
functions.
The phyloseq package introduces the S4 class object (class physeq), which contains (i) an OTU sample matrix, (ii) a taxonomic table, (iii) a sample metadata table, and two optional slots for (iv) a phylogenetic tree and (v) reference sequences.
Some packages already extend the phyloseq packages. For example, the
microbiome
package collection (Ernst et al. 2023)
provides some scripts and functions for manipulating microbiome
datasets.The speedyseq
package (McLaren 2020) provides faster versions of
phyloseq’s plotting and taxonomic merging functions, some of which
([merge_samples2()] and [merge_taxa_vec()]) are integrated in
MiscMetabar
(thanks to Mike. R. McLaren). The phylosmith Smith (2023) package already provides some
functions to extend and simplify the use of the phyloseq packages.
Other packages (mia
forming
the microbiome
package collection and MicrobiotaProcess
(Xu et al. 2023)) extend a new data
structure using the comprehensive Bioconductor ecosystem of the
SummarizedExperiment
family.
MiscMetabar
enriches this R ecosystem by providing
functions to (i) describe your dataset visually, (ii)
transform your data, (iii) explore
biological diversity (alpha, beta, and taxonomic diversity), and (iv)
simplify reproducibility. MiscMetabar
is
designed to complement and not compete with other R packages mentioned
above. For example. The mia
package is recommended for
studies focusing on phylogenetic trees, and phylosmith
allows easy visualization of co-occurrence networks. Using the
MicrobiotaProcess::as.MPSE
function, most of the utilities
in the MicrobiotaProcess
package are available with
functions from the MiscMetabar
.
I do not try to reinvent the wheel and prefer to rely on existing
packages and classes rather than building a new framework.
MiscMetabar
is based on the phyloseq class from phyloseq,
the most cited package in metagenomics (Wen et
al. 2023). For a description and comparison of these integrated
packages competing with phyloseq (e.g. microeco by C. Liu et al. (2020), EasyAmplicon by
Y.-X. Liu et al. (2023) and MicrobiomeAnalystR by Lu et al. (2023)) see Wen
et al. (2023). Note that some limitations of the phyloseq
packages are circumvented thanks to phylosmith (Smith 2023), microViz
((Barnett, Arts, and Penders 2021)) and MiscMetabar
.
Some packages provide an interactive interface useful for rapid
exploration and for code-beginner biologists. Animalcules (Zhao et al. 2021) and microViz
(Barnett, Arts, and Penders 2021) provides
shiny interactive interface whereas MicrobiomeAnalystR (Lu et al. 2023) is a web-based platform.
Session information
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 12 (bookworm)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Paris
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.5.1 fastmap_1.2.0
## [5] xfun_0.48 cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1
## [9] rmarkdown_2.28 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
## [13] pkgdown_2.1.1 textshaping_0.4.0 jquerylib_0.1.4 systemfonts_1.1.0
## [17] compiler_4.4.1 rstudioapi_0.16.0 tools_4.4.1 ragg_1.3.3
## [21] bslib_0.8.0 evaluate_1.0.0 yaml_2.3.10 jsonlite_1.8.9
## [25] rlang_1.1.4 fs_1.6.4 htmlwidgets_1.6.4