Please cite Vsearch if you use this function to assign taxonomy.
Usage
assign_sintax(
physeq = NULL,
seq2search = NULL,
ref_fasta = NULL,
behavior = "return_matrix",
vsearchpath = "vsearch",
clean_pq = TRUE,
nproc = 1,
suffix = "",
taxo_rank = c("K", "P", "C", "O", "F", "G", "S"),
min_boostrap = 0.5,
keep_temporary_files = FALSE,
verbose = TRUE,
temporary_fasta_file = "temp.fasta",
cmd_args = "--sintax_random"
)
Arguments
- physeq
(required): a
phyloseq-class
object obtained using thephyloseq
package.- seq2search
A DNAStringSet object of sequences to search for.
- ref_fasta
(required) A link to a database in vsearch format The reference database must contain taxonomic information in the header of each sequence in the form of a string starting with ";tax=" and followed by a comma-separated list of up to nine taxonomic identifiers. Each taxonomic identifier must start with an indication of the rank by one of the letters d (for domain) k (kingdom), p (phylum), c (class), o (order), f (family), g (genus), s (species), or t (strain). The letter is followed by a colon (:) and the name of that rank. Commas and semicolons are not allowed in the name of the rank. Non-ascii characters should be avoided in the names.
Example:
\>X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli,t:str._K-12_substr._MG1655
- behavior
Either "return_matrix" (default), "return_cmd", or "add_to_phyloseq":
"return_matrix" return a list of two matrix with taxonomic value in the first element of the list and bootstrap value in the second one.
"return_cmd" return the command to run without running it.
"add_to_phyloseq" return a phyloseq object with amended slot
@taxtable
. Only available if using physeq input and not seq2search input.
- vsearchpath
(default: "vsearch") path to vsearch
- clean_pq
(logical, default TRUE) If set to TRUE, empty samples and empty ASV are discarded before clustering.
- nproc
(default: 1) Set to number of cpus/processors to use
- suffix
(character) The suffix to name the new columns. If set to "" (the default), the taxo_rank algorithm is used without suffix.
- taxo_rank
A list with the name of the taxonomic rank present in ref_fasta
- min_boostrap
(Int. [0:1], default 0.5) Minimum bootstrap value to inform taxonomy. For each bootstrap below the min_boostrap value, the taxonomy information is set to NA.
- keep_temporary_files
(logical, default: FALSE) Do we keep temporary files?
temporary_fasta_file (default "temp.fasta") : the fasta file from physeq or seq2search
"output_taxo_vs.txt" : see Vsearch Manual for parameter –tabbedout
- verbose
(logical). If TRUE, print additional information.
- temporary_fasta_file
The name of a temporary_fasta_file (default "temp.fasta")
- cmd_args
Other arguments to be passed on to vsearch sintax cmd. By default cmd_args is equal to "–sintax_random" as recommended by Torognes.
Details
This function is mainly a wrapper of the work of others. Please cite vsearch.
Examples
# \donttest{
assign_sintax(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
behavior = "return_cmd"
)
#> Cleaning suppress 0 taxa and 0 samples.
#> Start Vsearch sintax
#> [1] "vsearch --sintax temp.fasta --db /tmp/RtmpbOMt4J/temp_libpath32c49814bbb/MiscMetabar/extdata/mini_UNITE_fungi.fasta.gz --tabbedout output_taxo_vs.txt --threads 1 --sintax_random"
data_fungi_mini_new <- assign_sintax(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
behavior = "add_to_phyloseq"
)
#> Cleaning suppress 0 taxa and 0 samples.
#> Start Vsearch sintax
assignation_results <- assign_sintax(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar")
)
#> Cleaning suppress 0 taxa and 0 samples.
#> Start Vsearch sintax
left_join(
tidyr::pivot_longer(assignation_results$taxo_value, -taxa_names),
tidyr::pivot_longer(assignation_results$taxo_boostrap, -taxa_names),
by = join_by(taxa_names, name),
suffix = c("rank", "bootstrap")
) |>
mutate(name = factor(name, levels = c("K", "P", "C", "O", "F", "G", "S"))) |>
# mutate(valuerank = forcats::fct_reorder(valuerank, as.integer(name), .desc = TRUE)) |>
ggplot(aes(valuebootstrap,
valuerank,
fill = name
)) +
geom_jitter(alpha = 0.8, aes(color = name)) +
geom_boxplot(alpha = 0.3)
# }