Extracts and counts occurrences of a given taxonomic rank from FASTA sequence headers. Supports both prefix-based formats (unite, sintax, greengenes2) and positional formats (dada2, pr2).
Arguments
- file
(Character, required) Path to a FASTA file (plain or gzip).
- rank_prefix
(Character, default
"k__") The prefix identifying the taxonomic rank to extract (e.g.,"k__"for kingdom,"p__"for phylum). Ignored iftax_formatis provided.- tax_format
(Character) If provided, one of
"unite","sintax","greengenes2", or"pr2". Overridesrank_prefixwith the first rank fromtax_prefixes(). IfNULL(default),rank_prefixis used as-is.- rank_position
(Integer) For positional (prefix-less) taxonomy headers, the 1-based position of the rank to extract from the semicolon-delimited string. Can be used with
tax_format = "pr2"or standalone (withouttax_format). Ignored for prefix-based formats.
Value
A named integer vector of counts, sorted in decreasing order. Names are the taxonomic rank values.
Examples
db <- system.file("extdata", "example_unite.fasta", package = "dbpq")
list_ranks_db(db, rank_prefix = "p__")
#> p__Basidiomycota p__Ascomycota
#> 3 2
list_ranks_db(db, tax_format = "unite")
#> k__Fungi
#> 5