Downloads the Greengenes2 16S rRNA database. By default, downloads the dada2-formatted training sets from Zenodo (maintained by Benjamin Callahan). Alternatively, downloads backbone sequences from the Greengenes2 FTP server.
Note that Greengenes2 uses d__ (domain) instead of k__ (kingdom)
as the first rank prefix. Use tax_format = "greengenes2" with
summarize_db() and list_ranks_db() for correct parsing.
Arguments
- dest_dir
(Character, default
".") Directory to save the downloaded file.- version
(Character, default
"2024.09") Greengenes2 version inYYYY.MMformat.- format
(Character, default
"dada2") One of:"dada2": dada2-formatted training set from Zenodo (recommended fordada2::assignTaxonomy())."dada2_species": species-level training set fordada2::assignTaxonomy()(includes species)."fasta": plain FASTA sequences from the FTP server.
- tax_format
(Character, default
"dada2") How to write taxonomy in the headers of the"dada2"/"dada2_species"training set. The Greengenes2 trainset ships withd__/p__rank prefixes, whichdada2::assignTaxonomy()andMiscMetabar::add_new_taxonomy_pq()reject. One of:"dada2": strip the prefixes to unprefixed, positional dada2 (>Bacteria;Pseudomonadota;...;)."sintax": rewrite as>ID;tax=d:Bacteria,p:...."keep": leave the originald__-prefixed headers untouched. Ignored forformat = "fasta".
- verbose
(Logical, default
TRUE) Print progress messages.
Details
The dada2-formatted files are maintained by Benjamin Callahan on Zenodo and are the same source as the SILVA dada2 training sets. See https://benjjneb.github.io/dada2/training.html for details.
The Greengenes2 trainset uses d__/p__ rank prefixes. By default
(tax_format = "dada2") the prefixes are stripped so the file is directly
usable by dada2::assignTaxonomy() and add_new_taxonomy_pq().
Please cite: McDonald D et al. (2024) Greengenes2 unifies microbial data in a single reference tree. Nature Biotechnology 42:715-718. doi:10.1038/s41587-023-01845-1