Converts taxonomy headers to the VSEARCH SINTAX format
(>ID;tax=k:Kingdom,p:Phylum,...). Wrapper around format_fasta_db().
Usage
format2sintax(
fasta_db = NULL,
taxnames = NULL,
input_format = "auto",
output_path = NULL,
id_prefix = "seq"
)Arguments
- fasta_db
(Character) Path to a FASTA file. Mutually exclusive with
taxnames.- taxnames
(Character vector) Taxonomy header strings (without leading
>). Mutually exclusive withfasta_db.- input_format
(Character, default
"auto") Input taxonomy format. One of"auto","unite","greengenes2","dada2".- output_path
(Character) If provided and
fasta_dbis used, write the reformatted FASTA to this path.- id_prefix
(Character, default
"seq") Prefix for synthetic sequence IDs generated when the input has none (e.g."dada2").
Value
If taxnames is used, a character vector of reformatted names.
If fasta_db is used, a DNAStringSet with reformatted names.
Examples
# UNITE format → SINTAX
format2sintax(taxnames = "AB123;k__Fungi;p__Ascomycota;c__Sordariomycetes")
#> [1] "AB123;tax=k:Fungi,p:Ascomycota,c:Sordariomycetes"
# Greengenes2 format → SINTAX
format2sintax(
taxnames = "abc123 d__Bacteria;p__Pseudomonadota",
input_format = "greengenes2"
)
#> [1] "abc123;tax=d:Bacteria,p:Pseudomonadota"
# dada2 trainset (taxonomy-only, positional) → SINTAX with synthetic IDs
format2sintax(
taxnames = "Bacteria;Pseudomonadota;Gammaproteobacteria;Vibrio;",
input_format = "dada2",
id_prefix = "SILVA_"
)
#> [1] "SILVA_000001;tax=d:Bacteria,p:Pseudomonadota,c:Gammaproteobacteria,o:Vibrio"