Downloads the KSGP (Karst, Silva, GTDB, and PR2) reference database for SSU rRNA taxonomic assignment, particularly optimized for Archaea communities. KSGP combines near full-length rRNA sequences from Karst et al. 2018, re-annotated SILVA prokaryote SSU sequences, cleaned GTDB 16S sequences, PR2 eukaryote 18S sequences, and MIDORI2 mitochondrial sequences. Taxonomy is based on GTDB, providing phylogenetically consistent classification.
Also provides access to the GTDB+ and GTDB_cleaned databases built as intermediate steps during KSGP construction.
Three annotation variants are available for database = "KSGP" and
file_type = "tax":
"sintax"(default): SINTAX-based taxonomic assignments (KSGP Sintax in the paper). Available for all versions."lca": Conservative lowest common ancestor assignments (KSGP LCA). Only available for version"3.1"."ksgp_plus": Similarity-clustered putative taxa (KSGP+). Only available for version"3.1".
Arguments
- dest_dir
(Character, default
".") Directory to save the downloaded file.- database
(Character, default
"KSGP") One of:"KSGP": Full KSGP SSU database (Archaea + Bacteria + Eukaryota)."GTDB_plus": Cleaned GTDB 16S sequences with PR2 and MIDORI2."GTDB_cleaned": Cleaned GTDB 16S sequences only (no eukaryote supplement).
- file_type
(Character, default
"fasta") One of:"fasta": FASTA file with SSU sequences."tax": Taxonomy file (.tax) with taxonomic annotations."archive": Complete.tar.gzarchive (all KSGP files, all annotation variants). Only available fordatabase = "KSGP".
- annotation
(Character, default
"lca") Taxonomic annotation method. One of"lca","sintax", or"ksgp_plus". Used to pick the matching.taxfile whenfile_type = "tax", and to pick the taxonomy merged into the FASTA headers whenfile_type = "fasta"andtax_format != "none". Thelcaannotation has the broadest sequence coverage and is the default for a fully-annotated KSGP FASTA. Only"sintax"is available for version"1.0".- tax_format
(Character, default
"dada2") Whenfile_type = "fasta", also download the companion.taxfile and merge its taxonomy into the FASTA headers (matched by sequence ID), so the file feedsMiscMetabar::add_new_taxonomy_pq(). One of"dada2","sintax", or"none"(keep accession-only headers). Sequences whose ID is absent from the.tax(e.g. the SILVA-derived portion) keep accession-only headers. Ignored forfile_type = "tax"/"archive".- version
(Character, default
"3.1") KSGP version. Known versions:"3.1"(2025, recommended) and"1.0".- verbose
(Logical, default
TRUE) Print progress messages.- timeout
(Numeric, default
Inf) Timeout in seconds for each HTTP request. The default disables R's 60-second timeout so the multi-hundred-MB to multi-GB downloads (KSGP FASTA, the v3.1 archive) can complete. Set to a positive number of seconds to restore a strict timeout.
Details
When file_type = "fasta", the function downloads the matching
KSGP_v<version>.tar.gz archive (one HTTP request) and extracts the
FASTA — and, when tax_format != "none", the chosen .tax file — to
dest_dir, then removes the archive. The archive is roughly 3.5x
smaller than the raw FASTA (e.g. ~686 MB vs ~2.4 GB for v3.1), so
this is both faster and lighter on the server than two separate
requests. The KSGP FASTA and taxonomy files are otherwise separate
downloads.
With tax_format = "sintax" (or "dada2"), the taxonomy is merged
into the FASTA headers (one sequence ID per row, matched against the
.tax file) and the .tax file is removed, so the result is a
single FASTA ready for VSEARCH/dada2 — the original prefix letters
from the .tax are preserved in the SINTAX output (a KSGP line
starting with k__Bacteria; becomes >ID;tax=k:Bacteria,...,
not d:Bacteria,...). To use KSGP for taxonomic assignment:
With VSEARCH SINTAX: download the FASTA (
file_type = "fasta",tax_format = "sintax").With dada2: download the FASTA (
file_type = "fasta",tax_format = "dada2").With LotuS2: the KSGP database is integrated directly.
For a complete set of all files: use
file_type = "archive".
KSGP substantially improves Archaea annotation over SILVA and Greengenes2: Class and Order assignments increase by 2.7x and 4.2x respectively.
Please cite: Grant A et al. (2025) KSGP 3.1: improved taxonomic annotation of Archaea communities using LotuS2, the genome taxonomy database and RNAseq data. ISME Communications 5(1): ycaf094. doi:10.1093/ismeco/ycaf094
Examples
if (FALSE) { # \dontrun{
# Download KSGP v3.1 FASTA
download_ksgp_db(dest_dir = "databases")
# Download KSGP v3.1 LCA taxonomy file
download_ksgp_db(
dest_dir = "databases",
file_type = "tax",
annotation = "lca"
)
# Download KSGP+ taxonomy file
download_ksgp_db(
dest_dir = "databases",
file_type = "tax",
annotation = "ksgp_plus"
)
# Download the complete KSGP archive (all annotation variants)
download_ksgp_db(dest_dir = "databases", file_type = "archive")
# Download GTDB+ (cleaned GTDB + PR2 + MIDORI2)
download_ksgp_db(dest_dir = "databases", database = "GTDB_plus")
} # }