taxinfo 0.1.2 [CRAN]
Breaking changes
The GBIF occurrence functions (
tax_gbif_occur_coords(),tax_occur_check(),tax_occur_check_pq(),tax_occur_multi_check_pq()) now default to GBIF’s Download API (method = "download"), which requires GBIF credentials (GBIF_USER,GBIF_PWD,GBIF_EMAILin your.Renviron). The previousrgbif::occ_search()behaviour is still available withmethod = "search"(no credentials, capped at 100,000 records). See https://docs.ropensci.org/rgbif/articles/gbif_credentials.html.tax_occur_check()(and its*_pq()wrappers) now report the true worldwide georeferenced count intotal_count_in_worldfor taxa with no occurrence in the search radius, where the previous behaviour returned0.
Changes
The WWF/TNC ecoregion layer downloaded on first use by the ecoregion functions (
points_to_ecoregions(),tax_check_ecoregion(),tax_ecoregion_occur()) is now written to a stable per-user cache directory (tools::R_user_dir("taxinfo", "cache")) instead of aninst/extdata/downloadsfolder created under the current working directory.gna_verifier_pq()removes thestatsandmain_taxon_thresholdparameters. These only affected kingdom-level summary metadata (not per-name results), andmain_taxon_thresholdwas never forwarded to the API bytaxize::gna_verifier()anyway.gna_verifier_pq()gains aproblematic_charsparameter (default"[?\\\\#|&]") to detect taxonomic names containing characters that corrupt the GNA Verifier GET URL (?,\,#,|,&), and aclean_problematic_charsparameter (defaultFALSE). When problematic names are found, a warning reports their count and examples; setclean_problematic_chars = TRUEto replace matching cells withNAbefore verification, or clean the data upstream (e.g. withMiscMetabar::simplify_taxo()).gna_verifier_pq()gains aforce_recomputeparameter (defaultFALSE). WhenTRUE, existing result columns matchingcol_prefixare removed from thetax_tablebefore re-adding them, avoiding duplicate-column errors on re-runs.select_taxa_pq()aborts with an explicit message naming the requestedtaxnameswhen none of them match thetax_table, instead of failing with an obscureOTU abundance data must have non-zero dimensionserror;taxa_summary_text()inherits the same clear behaviour.New function
tax_crosscheck_pq()compares name-verification results from GNA Verifier (taxize::gna_verifier()withdata_sources = 11, i.e. GBIF Backbone Taxonomy) andrgbif::name_backbone_checklist(). Returns a per-taxon comparison with status labels (match,mismatch,gna_only,backbone_only,both_na), a summary count vector, and an optional Venn diagram viaggVennDiagram. Discrepancies between the two services highlight taxa that may need manual review.tax_ecoregion_occur()gains amethodargument (forwarded totax_gbif_occur_coords()) and keeps the credential-freergbif::occ_search()path as its default, so ecoregion profiling and its wrappers (tax_ecoregion_occur_pq(),tax_check_ecoregion()) do not require GBIF credentials.tax_gbif_occur_coords()gains amethodargument ("download","download_sql","search") and server-side filter arguments (country,year_gte,year_lte,geometry). The default"download"collapses the former per-taxonrgbif::occ_search()loop into a single, citablergbif::occ_download()request and correctly retains infraspecific and higher-rank records.tax_occur_check()gains amethodargument ("download","search"); withmethod = "download"it issues a singlergbif::occ_download()constrained to the search bounding box instead ofrgbif::occ_search().tax_occur_check_pq()andtax_occur_multi_check_pq()now issue a single GBIF download for all taxa (and, fortax_occur_multi_check_pq(), all GPS points) whenmethod = "download", instead of onergbif::occ_search()call per taxon per point, and exposemethod,circle_form,clean_coordandn_occurarguments.theme_idest()falls back to the graphics-device default font when a requested font family (Roboto Condensed,Linux Libertine G,Fira Code) is not installed, instead of failing withinvalid font typewhen the plot is printed (for example duringR CMD checkexamples or a pkgdown render).range_bioreg_pq()andtax_check_ecoregion()now callgbif.range::read_ecoreg()andgbif.range::check_and_get_ecoreg()instead of the removedread_bioreg()/check_and_get_bioreg().
Bug Fixes
-
theme_idest(): whenx_is_species_name = TRUEory_is_species_name = TRUEis set, a message now indicates which axis will receive italic labels. This helps users catch the common mistake of passingx_is_species_name = TRUEwhen species names are on the y-axis (e.g. horizontal bar charts withaes(x = n, y = sp)), which previously caused ggplot2 to silently misinterpret the continuous x-axis as discrete and break the chart.
Breaking changes
-
tax_check_ecoregion()no longer takestaxa_nameas its first argument. The new signature follows the package-widephyseq = NULL, taxnames = NULL, taxonomic_rankpattern. Single-species positional calls liketax_check_ecoregion("Sp.", lon, lat)must becometax_check_ecoregion(taxnames = "Sp.", longitudes = lon, latitudes = lat). The return shape also changes:is_in_ecoregionis always an_taxa × n_pointslogical matrix, and the full long tibble of (taxon × ecoregion) counts is available in the newtaxon_ecoregionselement.
Major Changes
-
Changed default behavior: The
add_to_phyloseqparameter now defaults toTRUEwhen a phyloseq object is provided, andFALSEwhen using thetaxnamesparameter. This makes the workflow more intuitive - when working with phyloseq objects, the enriched object is returned by default.
New Features
Add
points_to_ecoregions()to locate a set of GPS points in the WWF/TNC terrestrial ecoregion layer. Returns a tibble withECO_NAME,biomeandrealmcolumns; used internally bytax_check_ecoregion().tax_check_ecoregion()has been rewritten as a thin comparison wrapper on top of the newtax_ecoregion_occur()andpoints_to_ecoregions()functions. It now supports a vector of taxa (viataxnames) or a phyloseq object (viaphyseq+taxonomic_rank), always returns an_taxa × n_pointslogical matrix inis_in_ecoregion, and caches the WWF/TNC shapefile across calls instead of re-downloading it throughgbif.range::check_and_get_ecoreg()each time. The shapefile is read viasf::st_join()(boundary-safe) instead ofsf::st_intersection().Add
tax_ecoregion_occur()to return a long tibble oftaxon_name × ECO_NAME × n_occur × prop_occurfrom GBIF occurrences, withmin_nb_occur/min_proportionfilters. Zero-occurrence taxa are kept withn_occur = 0Lso downstream joins do not silently drop them.Add
tax_ecoregion_occur_pq()as the phyloseq wrapper fortax_ecoregion_occur(). Whenadd_to_phyloseq = TRUE, three columns are added to@tax_table:ecoregion_top(modal ecoregion),ecoregion_n(number of qualifying ecoregions) andecoregion_list(semicolon-separated, ordered by descending occurrence count).Add
tax_gbif_occur_coords()to fetch georeferenced GBIF occurrences for a vector of taxa (capped byn_occur). Taxa with zero valid occurrences are listed inattr(result, "missing_taxa").tax_photos_pq()now works correctly withgallery = TRUEregardless of theadd_to_phyloseqvalue: whenadd_to_phyloseq = TRUEthe gallery is printed as a side-effect and the updated phyloseq object is returned invisibly. Thepixturepackage dependency has been removed; the gallery is now built withhtmltools(available on CRAN). Two new parametersimg_heightandimg_widthreplace the previoush/warguments passed via...topixture::pixgallery().Add
fungal_traits_guilds()to enrich a phyloseqtax_tablewith guild and trait information from both the FungalTraits and FUNGuild databases in a single call. The function automatically calls [gna_verifier_pq()] whencurrentCanonicalSimpleis absent, and optionally produces consensus columns (cons_trophicMode,cons_trophicMode_agreement) comparing the two sources.All main functions (
gna_verifier_pq(),tax_gbif_occur_pq(),tax_get_wk_info_pq(),tax_globi_pq(),tax_info_pq(),tax_iucn_code_pq(),tax_oa_pq(),tax_occur_check_pq(),tax_photos_pq()) now support thetaxnamesparameter, allowing users to query information for specific taxonomic names without a phyloseq object.Added comprehensive tests for
taxnamesparameter usage across all functions.Add functions
extract_spores_mycodb()andtax_spores_size_pq()to retrieve spore size information from MycoDB.Add params
year_colandauthorship_coltogna_verifier_pq()to output year of publication and authorship information for each taxa.Add function
intra_taxnames_dist()to compute pairwise DNA distances among taxa with the same taxonomic names.Add function
cluster_sbc()to (post)cluster taxa into SBC (Species bound cluster) defined as “clusters that include all and only ESVs assigned to one species, the sequence similarity threshold can vary between these clusters” by Riley et al. 2025 (https://doi.org/10.1186/s12915-025-02284-x). Also add a new vignette to illustrate the use ofcluster_sbc().Print information when using
tax_info_pq()withadd_to_phyloseq = TRUEto inform users that the phyloseq object is being updated.Add an example in
tax_info_pq()manual with the EPPO database to determine if pest species regulated in France are found in the example phyloseq object.Change the result column
genusintogenusEpithetfrom thegna_verifier_pq()function to avoid confusion between “Genus” and “genus” columns and to debug the use of duckdb intaxinfo_pq().gna_verifier_pq()now adds agenusSpeciesEpithetcolumn (whengenus_species_canonical_col = TRUE) that copiescurrentCanonicalSimplebut isNAfor genus-only names (i.e. whenspecificEpithetisNAor empty).gna_verifier_pq()gains aspecies_onlyparameter (defaultTRUE): whenTRUE,currentCanonicalSimpleis set toNAfor uninomial matches (matchedCardinality == 1, i.e. genus or higher-rank names with no species epithet).genusEpithetis always populated regardless of this setting;specificEpithetis alwaysNAfor uninomials independently of this parameter.
Bug fix
gna_verifier_pq(): fixed verbose summary always reporting 0 accepted/synonym names whenadd_to_phyloseq = FALSE(was incorrectly reading dropped columns fromres_verifier_cleaninstead ofres_verifier). FixedmatchedCardinalitythreshold used for “uninomial” reporting (was== 2instead of== 1). FixedgenusEpithetandspecificEpithetbeing absent from the return value whenadd_to_phyloseq = FALSEandgenus_species_canonical_col = TRUE(the function was returning the raw GNA result instead of the cleaned tibble). Fixed potential many-to-many join whenadd_to_phyloseq = TRUEby deduplicating onsubmittedNameafterselect()rather than before.Fixed issue in functions
tax_gbif_occur_pq()andrange_bioreg_pq()due to the loss of the column verbatim_index inrgbif::name_backbone_checklist()(commit c74602b).
Documentation
- Updated documentation for all functions to clarify the new default behavior of
add_to_phyloseq. - Added examples showing both phyloseq and taxnames usage patterns.
- Updated vignettes to demonstrate the dual-input capability (phyloseq objects vs. taxonomic name vectors).
- Updated README to highlight the flexible input options.
taxinfo 0.1.1
- Add
list_keywordsandn_citationcolumns in the return oftax_oa_pq().