Format taxonomy headers for dada2::assignTaxonomy

Converts taxonomy headers to the format expected by dada2::assignTaxonomy(): unprefixed semicolon-delimited taxonomy (>Kingdom;Phylum;Class;Order;Family;Genus;). Wrapper around format_fasta_db().

Usage

format2dada2(
  fasta_db = NULL,
  taxnames = NULL,
  input_format = "auto",
  output_path = NULL,
  pattern_to_remove = NULL
)

Arguments

fasta_db: (Character) Path to a FASTA file. Mutually exclusive with taxnames.
taxnames: (Character vector) Taxonomy header strings (without leading >). Mutually exclusive with fasta_db.
input_format: (Character, default "auto") Input taxonomy format. One of "auto", "sintax", "unite", "greengenes2".
output_path: (Character) If provided and fasta_db is used, write the reformatted FASTA to this path. The DNAStringSet is returned invisibly.
pattern_to_remove: (Character) Optional regex pattern to remove from the reformatted names (applied after conversion).

Value

If taxnames is used, a character vector. If fasta_db is used, a DNAStringSet with reformatted names. When output_path is provided, returned invisibly.

Author

Adrien Taudière

Examples

# SINTAX format → dada2
format2dada2(
  taxnames = "AB123;tax=k:Fungi,p:Ascomycota,c:Sordariomycetes"
)
#> [1] "Fungi;Ascomycota;Sordariomycetes;"

# UNITE format → dada2
format2dada2(
  taxnames = "AB123;k__Fungi;p__Ascomycota;c__Sordariomycetes",
  input_format = "unite"
)
#> [1] "Fungi;Ascomycota;Sordariomycetes;"