Introduction to MiscMetabar: an R package to facilitate visualization and reproducibility in metabarcoding analysis
Raison d’être
- Complete R packages dada2 and phyloseq
- Useful visualizations (
biplot_pq
,circle_pq
,upset_pq
,ggvenn_pq
) - Facilitate the use of targets package
Quick overview
For an introduction to metabarcoding in R, Please visite the state
of the field vignettes. The import,
export and track vignette explains how import and export
phyloseq
object. Its also show how to summarize useful
information (number of sequences, samples and clusters) accross
bioinformatic pipelines.
If you are interested in ecological metrics, see the vignettes describing alpha-diversity and beta-diversity analysis. The vignette filter taxa and samples describes some data-filtering processes using MiscMetabar and the reclustering tutorial introduces the different way of clustering already-clustered OTU/ASV. The vignette tengeler explore the dataset from Tengeler et al. (2020) using some MiscMetabar functions.
For developers, I also wrote a vignette describing som rules of codes.
Summarize a physeq object
library("MiscMetabar")
library("phyloseq")
data("data_fungi")
summary_plot_pq(data_fungi)
Create an interactive table of the tax_table
data("GlobalPatterns", package = "phyloseq")
tax_datatable(subset_taxa(
GlobalPatterns,
rowSums(GlobalPatterns@otu_table) > 100000
))
Sankey diagram of the tax_table
gp <- subset_taxa(GlobalPatterns, GlobalPatterns@tax_table[, 1] == "Archaea")
sankey_pq(gp, taxa = c(1:5))
Upset plot for visualize distribution of taxa in function of samples variables
upset_pq(gp, "SampleType", taxa = "Class")
References
Tengeler, A.C., Dam, S.A., Wiesmann, M. et al. Gut microbiota from persons with attention-deficit/hyperactivity disorder affects the brain in mice. Microbiome 8, 44 (2020). https://doi.org/10.1186/s40168-020-00816-x
Session inform
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Debian GNU/Linux 12 (bookworm)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
#>
#> locale:
#> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
#> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
#> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Paris
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MiscMetabar_0.11.0 purrr_1.0.2 dplyr_1.1.4 dada2_1.32.0
#> [5] Rcpp_1.0.13-1 ggplot2_3.5.1 phyloseq_1.48.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-9 deldir_2.0-4
#> [3] permute_0.9-7 rlang_1.1.4
#> [5] magrittr_2.0.3 ade4_1.7-22
#> [7] matrixStats_1.4.1 compiler_4.4.2
#> [9] mgcv_1.9-1 png_0.1-8
#> [11] systemfonts_1.1.0 vctrs_0.6.5
#> [13] reshape2_1.4.4 stringr_1.5.1
#> [15] pwalign_1.0.0 pkgconfig_2.0.3
#> [17] crayon_1.5.3 fastmap_1.2.0
#> [19] XVector_0.44.0 labeling_0.4.3
#> [21] utf8_1.2.4 Rsamtools_2.20.0
#> [23] rmarkdown_2.29 UCSC.utils_1.0.0
#> [25] ragg_1.3.3 xfun_0.49
#> [27] zlibbioc_1.50.0 cachem_1.1.0
#> [29] GenomeInfoDb_1.40.1 jsonlite_1.8.9
#> [31] biomformat_1.32.0 rhdf5filters_1.16.0
#> [33] DelayedArray_0.30.1 Rhdf5lib_1.26.0
#> [35] BiocParallel_1.38.0 jpeg_0.1-10
#> [37] parallel_4.4.2 cluster_2.1.6
#> [39] R6_2.5.1 RColorBrewer_1.1-3
#> [41] bslib_0.8.0 stringi_1.8.4
#> [43] ComplexUpset_1.3.3 GenomicRanges_1.56.2
#> [45] jquerylib_0.1.4 SummarizedExperiment_1.34.0
#> [47] iterators_1.0.14 knitr_1.49
#> [49] IRanges_2.38.1 Matrix_1.7-1
#> [51] splines_4.4.2 igraph_2.1.2
#> [53] tidyselect_1.2.1 rstudioapi_0.17.1
#> [55] abind_1.4-8 yaml_2.3.10
#> [57] vegan_2.6-8 codetools_0.2-20
#> [59] hwriter_1.3.2.1 lattice_0.22-6
#> [61] tibble_3.2.1 plyr_1.8.9
#> [63] Biobase_2.64.0 withr_3.0.2
#> [65] ShortRead_1.62.0 evaluate_1.0.1
#> [67] desc_1.4.3 survival_3.7-0
#> [69] RcppParallel_5.1.9 Biostrings_2.72.1
#> [71] pillar_1.9.0 MatrixGenerics_1.16.0
#> [73] DT_0.33 foreach_1.5.2
#> [75] stats4_4.4.2 generics_0.1.3
#> [77] S4Vectors_0.42.1 munsell_0.5.1
#> [79] scales_1.3.0 glue_1.8.0
#> [81] tools_4.4.2 interp_1.1-6
#> [83] data.table_1.16.4 GenomicAlignments_1.40.0
#> [85] fs_1.6.5 rhdf5_2.48.0
#> [87] grid_4.4.2 tidyr_1.3.1
#> [89] ape_5.8 crosstalk_1.2.1
#> [91] latticeExtra_0.6-30 colorspace_2.1-1
#> [93] patchwork_1.3.0 networkD3_0.4
#> [95] nlme_3.1-166 GenomeInfoDbData_1.2.12
#> [97] cli_3.6.3 textshaping_0.4.1
#> [99] fansi_1.0.6 S4Arrays_1.4.1
#> [101] gtable_0.3.6 sass_0.4.9
#> [103] digest_0.6.37 BiocGenerics_0.50.0
#> [105] SparseArray_1.4.8 farver_2.1.2
#> [107] htmlwidgets_1.6.4 htmltools_0.5.8.1
#> [109] pkgdown_2.1.1 multtest_2.60.0
#> [111] lifecycle_1.0.4 httr_1.4.7
#> [113] MASS_7.3-61