List and count taxonomic ranks from a FASTA database

Extracts and counts occurrences of a given taxonomic rank from FASTA sequence headers. Supports both prefix-based formats (unite, sintax, greengenes2) and positional formats (dada2, pr2).

Usage

list_ranks_db(
  file,
  rank_prefix = "k__",
  tax_format = NULL,
  rank_position = NULL
)

Arguments

file: (Character, required) Path to a FASTA file (plain or gzip).
rank_prefix: (Character, default "k__") The prefix identifying the taxonomic rank to extract (e.g., "k__" for kingdom, "p__" for phylum). Ignored if tax_format is provided.
tax_format: (Character) If provided, one of "unite", "sintax", "greengenes2", or "pr2". Overrides rank_prefix with the first rank from tax_prefixes(). If NULL (default), rank_prefix is used as-is.
rank_position: (Integer) For positional (prefix-less) taxonomy headers, the 1-based position of the rank to extract from the semicolon-delimited string. Can be used with tax_format = "pr2" or standalone (without tax_format). Ignored for prefix-based formats.

Value

A named integer vector of counts, sorted in decreasing order. Names are the taxonomic rank values.

Author

Adrien Taudière

Examples

db <- system.file("extdata", "example_unite.fasta", package = "dbpq")
list_ranks_db(db, rank_prefix = "p__")
#> p__Basidiomycota    p__Ascomycota 
#>                3                2 
list_ranks_db(db, tax_format = "unite")
#> k__Fungi 
#>        5