Skip to contents

lifecycle-experimental

This function is basically a wrapper of functions DECIPHER::IdTaxa() and DECIPHER::LearnTaxa(), please cite the DECIPHER package if you use this function. Note that if you want to specify parameters for the learning step you must used the trainingSet param instead of the a fasta_for_training. The training file can be obtain using the function learn_idtaxa().

It requires:

  • either a physeq or seq2search object.

  • either a trainingSet or a fasta_for_training

Usage

assign_idtaxa(
  physeq,
  seq2search = NULL,
  trainingSet = NULL,
  fasta_for_training,
  behavior = "return_matrix",
  column_names = c("Kingdom", "Phyla", "Class", "Order", "Family", "Genus", "Species"),
  suffix = "_idtaxa",
  nproc = 1,
  unite = FALSE,
  verbose = TRUE,
  ...
)

Arguments

physeq

(required): a phyloseq-class object obtained using the phyloseq package.

A DNAStringSet object of sequences to search for.

trainingSet

An object of class Taxa and subclass Train compatible with the class of test.

fasta_for_training

A fasta file (can be gzip) to train the trainingSet using the function learn_idtaxa(). Only used if trainingSet is NULL.

The reference database must contain taxonomic information in the header of each sequence in the form of a string starting with ";tax=" and followed by a comma-separated list of up to nine taxonomic identifiers.

The only exception is if unite=TRUE. In that case the UNITE taxonomy is automatically formatted.

behavior

Either "return_matrix" (default), or "add_to_phyloseq":

  • "return_matrix" return a list of two objects. The first element is the taxonomic matrix and the second element is the raw results from DECIPHER::IdTaxa() function.

  • "return_cmd" return the command to run without running it.

  • "add_to_phyloseq" return a phyloseq object with amended slot @taxtable. Only available if using physeq input and not seq2search input.

column_names

(vector of character) names for the column of the taxonomy

suffix

(character) The suffix to name the new columns. Default to "_idtaxa".

nproc

(default: 1) Set to number of cpus/processors to use

unite

(logical, default FALSE). If set to TRUE, the fasta_for_training file is formatted from UNITE format to sintax one, needed in fasta_for_training. Only used if trainingSet is NULL.

verbose

(logical). If TRUE, print additional information.

...

Additional arguments passed on to IdTaxa

Value

Either a new phyloseq object with additional information in the @tax_table slot or a list of two objects if behavior is "return_matrix"

Details

This function is mainly a wrapper of the work of others. Please make a reference to DECIPHER::IdTaxa() if you use this function.

Author

Adrien Taudière

Examples

if (FALSE) { # \dontrun{
# /!\ The value of threshold must be change for real database (recommend
#  value are between 50 and 70).

data_fungi_mini_new <- assign_idtaxa(data_fungi_mini,
  fasta_for_training = system.file("extdata", "mini_UNITE_fungi.fasta.gz",
    package = "MiscMetabar"
  ), threshold = 20, behavior = "add_to_phyloseq"
)

result_idtaxa <- assign_idtaxa(data_fungi_mini,
  fasta_for_training = system.file("extdata", "mini_UNITE_fungi.fasta.gz",
    package = "MiscMetabar"
  ), threshold = 20
)

plot(result_idtaxa$idtaxa_raw)
} # }