Remove potential contaminants by using known control taxa (e.g., spike-ins,
synthetic sequences) to estimate background contamination levels. For each
sample, a threshold is computed from the control taxa using a summary function
(default: max). Occurrences of non-control taxa that are at or below this
threshold are set to 0.
Usage
decontam_taxa_control(
physeq,
control_condition,
fun = max,
global_threshold = FALSE,
remove_control_taxa = TRUE,
clean_phyloseq_object = TRUE,
verbose = TRUE
)Arguments
- physeq
(phyloseq, required) A phyloseq object.
- control_condition
An expression evaluated on tax_table that returns TRUE for control taxa. Use .to refer to the phyloseq object. Examples:Genus == "Tintelnotia",Family == "Mitochondria",taxa_names(.) %in% c("ASV1", "ASV2").- fun
(function, default
max) A function to summarize the control taxa values for each sample (or globally ifglobal_threshold = TRUE). Common choices:max(most conservative, default),mean,median.- global_threshold
(logical, default FALSE) If TRUE, compute a single global threshold from all control taxa occurrences instead of per-sample thresholds.
- remove_control_taxa
(logical, default TRUE) Whether to remove the control taxa from the output phyloseq object after decontamination.
- clean_phyloseq_object
(logical, default TRUE) Whether to clean the resulting phyloseq object using
clean_pq()to remove empty taxa/samples.- verbose
(logical, default TRUE) Whether to print additional information.
Examples
library(MiscMetabar)
# Using a condition on tax_table (e.g., select by Genus)
decontam_taxa_control(data_fungi, Genus == "Tintelnotia")
#> Decontamination complete.
#> Threshold type: per-sample
#> Control taxa: 1
#> Non-control taxa: 1419
#> Threshold function: max
#> Controls removed: TRUE
#> Sequences: 1839124 -> 1835765 (-3359)
#> Occurrences: 12499 -> 12317 (-182)
#> Taxa: 1420 -> 1417 (-3)
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 1417 taxa and 185 samples ]
#> sample_data() Sample Data: [ 185 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 1417 taxa by 12 taxonomic ranks ]
#> refseq() DNAStringSet: [ 1417 reference sequences ]
# Using taxa names directly
control_taxa <- phyloseq::taxa_names(data_fungi)[1:2]
decontam_taxa_control(data_fungi, taxa_names(.) %in% control_taxa)
#> Decontamination complete.
#> Threshold type: per-sample
#> Control taxa: 2
#> Non-control taxa: 1418
#> Threshold function: max
#> Controls removed: TRUE
#> Sequences: 1839124 -> 1578082 (-261042)
#> Occurrences: 12499 -> 6306 (-6193)
#> Taxa: 1420 -> 1351 (-69)
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 1351 taxa and 169 samples ]
#> sample_data() Sample Data: [ 169 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 1351 taxa by 12 taxonomic ranks ]
#> refseq() DNAStringSet: [ 1351 reference sequences ]
# Use a global threshold
decontam_taxa_control(data_fungi, Genus == "Tintelnotia", global_threshold = TRUE)
#> Decontamination complete.
#> Threshold type: global
#> Control taxa: 1
#> Non-control taxa: 1419
#> Threshold function: max
#> Global threshold value: 210
#> Controls removed: TRUE
#> Sequences: 1839124 -> 1615276 (-223848)
#> Occurrences: 12499 -> 1029 (-11470)
#> Taxa: 1420 -> 582 (-838)
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 582 taxa and 146 samples ]
#> sample_data() Sample Data: [ 146 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 582 taxa by 12 taxonomic ranks ]
#> refseq() DNAStringSet: [ 582 reference sequences ]
# Keep control taxa in output
decontam_taxa_control(data_fungi, Genus == "Tintelnotia", remove_control_taxa = FALSE)
#> Decontamination complete.
#> Threshold type: per-sample
#> Control taxa: 1
#> Non-control taxa: 1419
#> Threshold function: max
#> Controls removed: FALSE
#> Sequences: 1839124 -> 1835977 (-3147)
#> Occurrences: 12499 -> 12319 (-180)
#> Taxa: 1420 -> 1418 (-2)
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 1418 taxa and 185 samples ]
#> sample_data() Sample Data: [ 185 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 1418 taxa by 12 taxonomic ranks ]
#> refseq() DNAStringSet: [ 1418 reference sequences ]