Why bootpq?
A single rarefaction to even sequencing depth throws away part of the data and depends on one random draw. The conclusion of a downstream test (PERMANOVA, a Hill-number group comparison, a variation partitioning) can therefore change from one rarefaction to the next.
bootpq removes this fragility by repeating the
analysis across many rarefaction permutations — each with a
different random seed — and summarising the results (mean and
quantiles). You get an estimate together with its variability across
rarefactions, instead of a single, possibly atypical, value.
bootpq is part of the pqverse ecosystem and
builds on MiscMetabar.
Installation
# install.packages("remotes")
remotes::install_github("adrientaudiere/bootpq")A worked example
We use data_fungi_mini, a small fungal metabarcoding
dataset shipped with MiscMetabar. We first drop samples
with missing covariates.
library(bootpq)
library(MiscMetabar)
data(data_fungi_mini)
data_fungi_woNA <- subset_samples(
data_fungi_mini,
!is.na(Time) & !is.na(Height)
)
data_fungi_woNA
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 45 taxa and 75 samples ]
#> sample_data() Sample Data: [ 75 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 45 taxa by 12 taxonomic ranks ]
#> refseq() DNAStringSet: [ 45 reference sequences ]PERMANOVA across rarefaction permutations
adonis_rarperm_pq() runs vegan::adonis2()
on each rarefied version of the object and summarises the results. We
keep nperm small here so the vignette builds quickly; in
practice use the default (nperm = 99) or more.
res <- adonis_rarperm_pq(
data_fungi_woNA,
"Time*Height",
na_remove = TRUE,
nperm = 9,
progress_bar = FALSE
)
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
#> Taxa are now in columns.
#> Removing NA from Time
#> Removing NA from Height
# Mean of the PERMANOVA table across the 9 rarefaction permutations
res$mean
#> Df SumOfSqs R2 F Pr(>F)
#> Model 5 2.429663 0.0687147 1.019227 0.4693333
#> Residual 69 32.933300 0.9312853 NA NA
#> Total 74 35.362963 1.0000000 NA NAres$quantile_min and res$quantile_max give
the corresponding lower and upper quantiles, so you can see how stable
each term is across rarefactions.
Hill-number group tests
hill_test_rarperm_pq() compares Hill numbers (q = 0, 1,
2) between the levels of a factor across rarefaction permutations. It
relies on ggstatsplot, so the chunk below is shown but not
run during the vignette build.
hill_test_rarperm_pq(data_fungi_woNA, fact = "Height", nperm = 99)Variation partitioning
var_par_rarperm_pq() repeats a variation partitioning
across rarefaction permutations, summarising how much variation each set
of explanatory variables explains.
var_par_rarperm_pq(
data_fungi_woNA,
list_component = list(
"Time" = c("Time"),
"Size" = c("Height", "Diameter")
),
nperm = 99
)Where to go next
- The function reference
documents every argument, including how to control the rarefaction depth
(
sample.size) and the quantiles reported (quantile_prob). -
bootpqworks on anyphyloseqobject, so the same workflow applies to your own data.