Export a phyloseq object to GBIF MDT template CSV/TSV files
Source:R/phyloseq_to_MDT_csv.R
phyloseq_to_MDT_csv.RdWrite the OTU table, the taxonomy table and the sample data of a phyloseq
object to separate delimited text files, one per table, following the
GBIF Metabarcoding Data Toolkit (MDT) template
layout. This is the plain-text (CSV/TSV) counterpart of
phyloseq_to_MDT_excel(), which writes a single multi-sheet .xlsx file.
The MDT template expects the OTU table with OTU IDs in rows and sample IDs
in columns (sequence read counts in the cells), and the sample and
taxonomy tables keyed by a leading id column. The reference sequences,
when available, are appended as a DNA_sequence column of the taxonomy file
(Darwin Core DNA-derived-data extension term).
When check_dwc = TRUE (the default), a lightweight Darwin Core compliance
check warns about recommended sample-level terms that are missing from
sample_data (decimalLatitude, decimalLongitude, eventDate). This is
a non-blocking helper, not a full validation of the GBIF MDT template.
Arguments
- physeq
(required) a
phyloseq-classobject.- path
(character, default
".") Directory where the files are written. Created (recursively) if it does not exist.- prefix
(character, default
"") Optional prefix prepended to each output file name (e.g."data_fungi_").- sep
(character, default
",") Field separator. Use"\t"to write tab-separated (TSV) files, the format favored by the GBIF MDT validator. The file extension followssep(.csvfor",",.tsvotherwise).- check_dwc
(logical, default TRUE) If TRUE, warn about recommended Darwin Core sample-level terms missing from
sample_data.
Value
Invisibly returns a named character vector of the written file paths
(OTU_table, Taxonomy, Samples).
Details
See the GBIF Metabarcoding Data Toolkit for the expected input format. The MDT accepts both TSV and XLSX uploads.
Examples
# \donttest{
out_dir <- file.path(tempdir(), "mdt_csv")
files <- phyloseq_to_MDT_csv(clean_pq(data_fungi_mini), path = out_dir)
#> Warning: ! Recommended Darwin Core sample terms missing from sample_data:
#> "decimalLatitude", "decimalLongitude", and "eventDate".
#> ℹ Add them before GBIF MDT submission if available.
#> MDT template files written to /tmp/RtmpMeUGPY/mdt_csv
file.exists(files)
#> [1] TRUE TRUE TRUE
unlink(out_dir, recursive = TRUE)
# }
if (FALSE) { # \dontrun{
# Tab-separated output (MDT-favored TSV)
phyloseq_to_MDT_csv(data_fungi_mini, path = "mdt", sep = "\t")
} # }