Skip to contents

lifecycle-maturing

Converts taxonomy headers to the format expected by dada2::assignTaxonomy(): unprefixed semicolon-delimited taxonomy (>Kingdom;Phylum;Class;Order;Family;Genus;). Wrapper around format_fasta_db().

Usage

format2dada2(
  fasta_db = NULL,
  taxnames = NULL,
  input_format = "auto",
  output_path = NULL,
  pattern_to_remove = NULL
)

Arguments

fasta_db

(Character) Path to a FASTA file. Mutually exclusive with taxnames.

taxnames

(Character vector) Taxonomy header strings (without leading >). Mutually exclusive with fasta_db.

input_format

(Character, default "auto") Input taxonomy format. One of "auto", "sintax", "unite", "greengenes2".

output_path

(Character) If provided and fasta_db is used, write the reformatted FASTA to this path. The DNAStringSet is returned invisibly.

pattern_to_remove

(Character) Optional regex pattern to remove from the reformatted names (applied after conversion).

Value

If taxnames is used, a character vector. If fasta_db is used, a DNAStringSet with reformatted names. When output_path is provided, returned invisibly.

Author

Adrien Taudière

Examples

# SINTAX format → dada2
format2dada2(
  taxnames = "AB123;tax=k:Fungi,p:Ascomycota,c:Sordariomycetes"
)
#> [1] "Fungi;Ascomycota;Sordariomycetes;"

# UNITE format → dada2
format2dada2(
  taxnames = "AB123;k__Fungi;p__Ascomycota;c__Sordariomycetes",
  input_format = "unite"
)
#> [1] "Fungi;Ascomycota;Sordariomycetes;"