Usage
assign_vsearch_lca(
physeq = NULL,
seq2search = NULL,
ref_fasta = NULL,
behavior = "return_matrix",
vsearchpath = "vsearch",
clean_pq = TRUE,
taxo_rank = c("K", "P", "C", "O", "F", "G", "S"),
nproc = 1,
suffix = "",
id = 0.5,
lca_cutoff = 1,
maxrejects = 32,
top_hits_only = TRUE,
maxaccepts = 0,
keep_temporary_files = FALSE,
verbose = TRUE,
temporary_fasta_file = "temp.fasta",
cmd_args = ""
)
Arguments
- physeq
(required): a
phyloseq-class
object obtained using thephyloseq
package.- seq2search
A DNAStringSet object of sequences to search for.
- ref_fasta
(required) A link to a database in vsearch format The reference database must contain taxonomic information in the header of each sequence in the form of a string starting with ";tax=" and followed by a comma-separated list of up to nine taxonomic identifiers. Each taxonomic identifier must start with an indication of the rank by one of the letters d (for domain) k (kingdom), p (phylum), c (class), o (order), f (family), g (genus), s (species), or t (strain). The letter is followed by a colon (:) and the name of that rank. Commas and semicolons are not allowed in the name of the rank. Non-ascii characters should be avoided in the names.
Example:
\>X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli,t:str._K-12_substr._MG1655
- behavior
Either "return_matrix" (default), "return_cmd", or "add_to_phyloseq":
"return_matrix" return a list of two matrix with taxonomic value in the first element of the list and bootstrap value in the second one.
"return_cmd" return the command to run without running it.
"add_to_phyloseq" return a phyloseq object with amended slot
@taxtable
. Only available if using physeq input and not seq2search input.
- vsearchpath
(default: "vsearch") path to vsearch
- clean_pq
(logical, default TRUE) If set to TRUE, empty samples and empty ASV are discarded before clustering.
- taxo_rank
A list with the name of the taxonomic rank present in ref_fasta
- nproc
(int, default: 1) Set to number of cpus/processors to use
- suffix
(character) The suffix to name the new columns. If set to "" (the default), the taxo_rank algorithm is used without suffix.
- id
(Int. [0:1] default 0.5). Default value is based on stampa. See Vsearch Manual for parameter
--id
- lca_cutoff
(int, default 1). Fraction of matching hits required for the last common ancestor (LCA) output. For example, a value of 0.9 imply that if less than 10% of assigned species are not congruent the taxonomy is filled. Default value is based on stampa. See Vsearch Manual for parameter
--lca_cutoff
Text from vsearch manual : "Adjust the fraction of matching hits required for the last common ancestor (LCA) output with the –lcaout option during searches. The default value is 1.0 which requires all hits to match at each taxonomic rank for that rank to be included. If a lower cutoff value is used, e.g. 0.95, a small fraction of non-matching hits are allowed while that rank will still be reported. The argument to this option must be larger than 0.5, but not larger than 1.0"
- maxrejects
(int, default: 32) Maximum number of non-matching target sequences to consider before stopping the search for a given query. Default value is based on stampa See Vsearch Manual for parameter
--maxrejects
.- top_hits_only
(Logical, default TRUE) Only the top hits with an equally high percentage of identity between the query and database sequence sets are written to the output. If you set top_hits_only you may need to set a lower
maxaccepts
and/orlca_cutoof
. Default value is based on stampa See Vsearch Manual for parameter--top_hits_only
- maxaccepts
(int, default: 0) Default value is based on stampa. Maximum number of matching target sequences to accept before stopping the search for a given query. See Vsearch Manual for parameter
--maxaccepts
- keep_temporary_files
(logical, default: FALSE) Do we keep temporary files?
temporary_fasta_file (default "temp.fasta") : the fasta file from physeq or seq2search
"out_lca.txt" : see Vsearch Manual for parameter –lcaout
"userout.txt" : see Vsearch Manual for parameter –userout
- verbose
(logical). If TRUE, print additional information.
- temporary_fasta_file
Name of the temporary fasta file. Only useful with keep_temporary_files = TRUE.
- cmd_args
Other arguments to be passed on to vsearch usearch_global cmd.
Examples
# \donttest{
data_fungi_mini_new <- assign_vsearch_lca(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
lca_cutoff = 0.9
)
#> Cleaning suppress 0 taxa and 0 samples.
# }