Check for taxa occurrences within a radius around samples using GBIF data
Source:R/tax_occur_check_pq.R
tax_occur_check_pq.Rd<a href="https://adrientaudiere.github.io/MiscMetabar/articles/Rules.html#lifecycle"> <img src="https://img.shields.io/badge/lifecycle-experimental-orange" alt="lifecycle-experimental"></a>
This function performs a species range check for taxa contained in a phyloseq object. The result can optionally be added to the phyloseq object's tax_table as new columns.
Usage
tax_occur_check_pq(
physeq = NULL,
taxnames = NULL,
taxonomic_rank = "currentCanonicalSimple",
longitude = NULL,
latitude = NULL,
radius_km = 50,
n_occur = 1000,
method = c("download", "search"),
circle_form = TRUE,
clean_coord = TRUE,
clean_coord_verbose = FALSE,
add_to_phyloseq = NULL,
col_prefix = NULL,
verbose = TRUE,
discard_genus_alone = identical(taxonomic_rank, "currentCanonicalSimple"),
discard_NA = TRUE,
...
)Arguments
- physeq
(optional) phyloseq object. Either `physeq` or `taxnames` must be provided, but not both. The phyloseq object containing the taxa to check.
- taxnames
(optional) A character vector of taxonomic names.
- taxonomic_rank
Character. The taxonomic rank to use for the check. Default is "currentCanonicalSimple" which corresponds to the cleaned scientific names in the phyloseq object if [gna_verifier_pq()] was used with default parameter.
- longitude
Numeric. Longitude of the test point in decimal degrees.
- latitude
Numeric. Latitude of the test point in decimal degrees.
- radius_km
Numeric. Search radius in kilometers (default: 50).
- n_occur
Numeric. Maximum number of occurrences to retrieve from GBIF for each taxon (default: 1000).
- method
(character, default `"download"`). How occurrences are fetched. `"download"` issues a single [rgbif::occ_download()] for all taxa around the point (**requires GBIF credentials**); `"search"` uses a per-taxon [rgbif::occ_search()] loop. See [tax_occur_check()].
- circle_form
(Logical, default: TRUE). Whether to use a circular search area. If FALSE, a square bounding box is used.
- clean_coord
(Logical, default: TRUE). Whether to clean coordinates using `CoordinateCleaner`.
- clean_coord_verbose
(Logical, default: FALSE). Whether to print messages from `CoordinateCleaner`.
- add_to_phyloseq
(Logical, default TRUE when physeq is provided, FALSE when taxnames is provided). Whether to add the results as new columns in the phyloseq object's tax_table. If TRUE, the results will be appended to the tax_table with appropriate column names. Automatically set to TRUE when a phyloseq object is provided and FALSE when taxnames is provided. Cannot be TRUE if `taxnames` is provided.
- col_prefix
A character string to be added as a prefix to the new columns names added to the tax_table slot of the phyloseq object (default: NULL).
- verbose
(Logical, default: TRUE). Whether to print progress messages.
- discard_genus_alone
(logical, default `TRUE` when `taxonomic_rank == "currentCanonicalSimple"`). Passed to [taxonomic_rank_to_taxnames()].
- discard_NA
(logical, default `TRUE`). Passed to [taxonomic_rank_to_taxnames()].
- ...
Additional parameters passed to [tax_occur_check()].
Value
Either a data frame (if add_to_phyloseq = FALSE) or a new phyloseq object (if add_to_phyloseq = TRUE).
Examples
if (FALSE) { # \dontrun{
data_fungi_mini_cleanNames <- gna_verifier_pq(data_fungi_mini)
check_res <- tax_occur_check_pq(data_fungi_mini_cleanNames,
longitude = 2.3,
latitude = 48,
radius_km = 100,
n_occur = 50,
add_to_phyloseq = FALSE
)
check_res |>
mutate(taxa_name = forcats::fct_reorder(taxa_name, count_in_radius)) |>
ggplot(aes(x = count_in_radius, y = taxa_name, fill = total_count_in_world)) +
geom_col()
data_fungi_mini_cleanNames_range_verif <-
tax_occur_check_pq(data_fungi_mini_cleanNames,
longitude = 2.3,
latitude = 48,
radius_km = 50,
n_occur = 10
)
df <- data_fungi_mini_cleanNames_range_verif@tax_table[, "count_in_radius"] |>
table(useNA = "always") |>
data.frame()
colnames(df) <- c("count_in_radius", "n_taxa")
df
# Subset taxa with at least one occurrence in the radius
cond_count_sup_0 <-
data_fungi_mini_cleanNames_range_verif@tax_table[, "count_in_radius"] |>
as.numeric() > 0
cond_count_sup_0[is.na(cond_count_sup_0)] <- FALSE
names(cond_count_sup_0) <- taxa_names(data_fungi_mini_cleanNames_range_verif)
subset_taxa_pq(data_fungi_mini_cleanNames_range_verif, cond_count_sup_0) |>
summary_plot_pq()
} # }