The taxinfo package provides comprehensive tools for augmenting phyloseq objects with taxonomic-based information from various external data sources. It seamlessly integrates data from GBIF, Wikipedia, GLOBI, OpenAlex, TAXREF, and other databases to enrich your taxonomic analyses.
Overview
taxinfo is designed to work with phyloseq objects and provides functions to:
- Verify and clean taxonomic names using the Global Names Architecture (GNA)
- Retrieve occurrence data from GBIF and other biodiversity databases
- Access taxonomic traits from various databases including FungalTraits
- Get Wikipedia information including page views, links, and content statistics
- Fetch scientific literature data from OpenAlex
- Access interaction data from GLOBI (Global Biotic Interactions)
- Validate geographic occurrences against ecoregions and biogeographic regions
- Retrieve taxonomic photos and media information
Installation
You can install the stable version of taxinfo from CRAN:
install.packages("taxinfo")Or the development version from GitHub:
# Install from GitHub
devtools::install_github("adrientaudiere/taxinfo")
# Or using pak
pak::pkg_install("adrientaudiere/taxinfo")Key Features
🔍 Data Verification & Quality Control
-
gna_verifier_pq(): Verify and standardize taxonomic names using Global Names Architecture
🌍 Biodiversity Data Integration
-
tax_gbif_occur_pq(): Retrieve GBIF occurrence data -
tax_globi_pq(): Access species interaction data from GLOBI -
tax_info_pq(): Add information from CSV files (TAXREF, traits databases)
📚 Knowledge Base Integration
-
tax_get_wk_info_pq(): Get comprehensive Wikipedia data -
tax_oa_pq(): Retrieve scientific literature from OpenAlex
🗺️ Geographic Analysis
-
range_bioreg_pq(): Analyze biogeographic ranges -
plot_tax_gbif_pq(): Create distribution maps -
tax_check_ecoregion(): Validate occurrences against ecoregions
🔬 Advanced Analysis Tools
-
tax_retroblast_pq(): Sequence-based taxonomic verification -
tax_photos_pq(): Access taxonomic images and media -
tax_occur_check_pq(): Multi-source occurrence validation
Quick Start
Glossary
Following the Darwin core standards, here are some key terms used in taxinfo (camel case naming convention): - scientificName: The full scientific name with authorship and date information if known (e.g., “Stereum ostrea (Blume & T.Nees) Fr., 1838”) - genus: Just the genus part (e.g., “Stereum”) - specificEpithet: Just the species epithet part (e.g., “ostrea”) - namePublishedInYear: The year the name was published (e.g., “1838”)
Other terms come from verifier globalnames (camel case naming convention): - currentCanonicalSimple: The simplified scientific name without authorship (e.g., “Stereum ostrea”). It correspond to concatenation of the genus and specificEpithet fields.
Example Workflow
library(taxinfo)
#> Loading required package: MiscMetabar
#> Loading required package: phyloseq
#> Loading required package: ggplot2
#> Loading required package: dada2
#> Loading required package: Rcpp
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: purrr
library(MiscMetabar)
# Load example data (fungal phyloseq object from MiscMetabar)
data("data_fungi_mini", package = "MiscMetabar")
# Step 1: Verify and clean taxonomic names
data_fungi_clean <- gna_verifier_pq(data_fungi_mini,
data_sources = 210
)
#> ✔ GNA verification summary:
#> • Total taxa in phyloseq: 45
#> • Taxa submitted for verification: 37
#> • Genus-level only taxa: 2
#> • Total matches found: 25
#> • Synonyms: 4 (including 4 at genus level)
#> • Accepted names: 21 (including 15 at genus level)
# Step 2: Add GBIF occurrence data (add_to_phyloseq defaults to TRUE)
data_with_gbif <- tax_gbif_occur_pq(data_fungi_clean)
#> ℹ Processing GBIF occurrences for Stereum ostrea
#> ℹ Processing GBIF occurrences for Ossicaulis lachnopus
#> ■■■■■■ 17% | ETA: 6s
#> ℹ Processing GBIF occurrences for Stereum hirsutum
#> ■■■■■■ 17% | ETA: 6sℹ Processing GBIF occurrences for Basidiodendron eyrei
#> ■■■■■■ 17% | ETA: 6sℹ Processing GBIF occurrences for Sistotrema oblongisporum
#> ■■■■■■ 17% | ETA: 6s■■■■■■■■■■■ 33% | ETA: 6s
#> ℹ Processing GBIF occurrences for Fomes fomentarius
#> ■■■■■■■■■■■ 33% | ETA: 6sℹ Processing GBIF occurrences for Cerocorticium molare
#> ■■■■■■■■■■■ 33% | ETA: 6sℹ Processing GBIF occurrences for Aporpium canescens
#> ■■■■■■■■■■■ 33% | ETA: 6sℹ Processing GBIF occurrences for Hypochnicium analogum
#> ■■■■■■■■■■■ 33% | ETA: 6sℹ Processing GBIF occurrences for Hyphoderma roseocremeum
#> ■■■■■■■■■■■ 33% | ETA: 6sℹ Processing GBIF occurrences for Hyphoderma setigerum
#> ■■■■■■■■■■■ 33% | ETA: 6s■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3s
#> ℹ Processing GBIF occurrences for Trametes versicolor
#> ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3sℹ Processing GBIF occurrences for Peniophora versiformis
#> ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3sℹ Processing GBIF occurrences for Exidia glandulosa
#> ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3sℹ Processing GBIF occurrences for Peniophorella pubera
#> ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3sℹ Processing GBIF occurrences for Auricularia mesenterica
#> ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 3s■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 1s
#> ℹ Processing GBIF occurrences for Hericium coralloides
#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 1s■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
#> ℹ Processing GBIF occurrences for Xylodon flaviporus
# Step 3: Add trait information
fungal_traits <- system.file("extdata", "fun_trait_mini.csv", package = "taxinfo")
data_with_traits <- tax_info_pq(data_with_gbif,
taxonomic_rank = "genusEpithet",
file_name = fungal_traits,
csv_taxonomic_rank = "GENUS",
col_prefix = "ft_",
sep = ";"
)
#> ✔ Added 18 columns from '/tmp/RtmpVMKfDC/temp_libpath1bc0c32ecb02dd/taxinfo/extdata/fun_trait_mini.csv' with information for 0 taxa in the tax_table slot of the phyloseq object
# Step 4: Add Wikipedia information (add_to_phyloseq defaults to TRUE)
data_final <- tax_get_wk_info_pq(data_with_traits)
#> ℹ Getting taxonomic IDs from Wikidata...
#> ℹ Getting page views from Wikipedia for Stereum ostrea
#> ℹ Getting page views from Wikipedia for Ossicaulis lachnopus
#> ℹ Getting page views from Wikipedia for Stereum hirsutum
#> ℹ Getting page views from Wikipedia for Basidiodendron eyrei
#> ℹ Getting page views from Wikipedia for Sistotrema oblongisporum
#> ℹ Getting page views from Wikipedia for Fomes fomentarius
#> ℹ Getting page views from Wikipedia for Mycena renatii
#> ℹ Getting page views from Wikipedia for Cerocorticium molare
#> ℹ Getting page views from Wikipedia for Aporpium canescens
#> ℹ Getting page views from Wikipedia for Hypochnicium analogum
#> ℹ Getting page views from Wikipedia for Hyphoderma roseocremeum
#> ℹ Getting page views from Wikipedia for Hyphoderma setigerum
#> ℹ Getting page views from Wikipedia for Trametes versicolor
#> ℹ Getting page views from Wikipedia for Peniophora versiformis
#> ℹ Getting page views from Wikipedia for Exidia glandulosa
#> ℹ Getting page views from Wikipedia for Peniophorella pubera
#> ℹ Getting page views from Wikipedia for Auricularia mesenterica
#> ℹ Getting page views from Wikipedia for Hericium coralloides
#> ℹ Getting page views from Wikipedia for Xylodon flaviporus
# View the enriched taxonomic table
head(data_final@tax_table)
#> Taxonomy Table: [6 taxa by 46 taxonomic ranks]:
#> ft_GENUS ft_Source ft_COMMENT.on.genus ft_primary_lifestyle
#> ASV7 "NA" NA NA NA
#> ASV8 "Stereum" NA NA NA
#> ASV12 "Xylodon" NA NA NA
#> ASV18 "Stereum" NA NA NA
#> ASV25 "Ossicaulis" NA NA NA
#> ASV26 "Stereum" NA NA NA
#> ft_Secondary_lifestyle ft_Comment_on_lifestyle_template
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Endophytic_interaction_capability_template
#> ASV7 NA
#> ASV8 NA
#> ASV12 NA
#> ASV18 NA
#> ASV25 NA
#> ASV26 NA
#> ft_Plant_pathogenic_capacity_template ft_Decay_substrate_template
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Decay_type_template ft_Aquatic_habitat_template
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Animal_biotrophic_capacity_template ft_Specific_hosts
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Growth_form_template ft_Fruitbody_type_template
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Hymenium_type_template ft_Ectomycorrhiza_exploration_type_template
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_Ectomycorrhiza_lineage_template ft_primary_photobiont
#> ASV7 NA NA
#> ASV8 NA NA
#> ASV12 NA NA
#> ASV18 NA NA
#> ASV25 NA NA
#> ASV26 NA NA
#> ft_secondary_photobiont Domain Phylum Class
#> ASV7 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV8 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV12 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV18 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV25 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV26 NA "Fungi" "Basidiomycota" "Agaricomycetes"
#> Order Family Genus Species Trophic.Mode
#> ASV7 "Russulales" "Stereaceae" NA NA "Saprotroph"
#> ASV8 "Russulales" "Stereaceae" "Stereum" "ostrea" "Saprotroph"
#> ASV12 "Hymenochaetales" "Schizoporaceae" "Xylodon" "raduloides" "Saprotroph"
#> ASV18 "Russulales" "Stereaceae" "Stereum" "ostrea" "Saprotroph"
#> ASV25 "Agaricales" "Lyophyllaceae" "Ossicaulis" "lachnopus" "Saprotroph"
#> ASV26 "Russulales" "Stereaceae" "Stereum" "hirsutum" "Saprotroph"
#> Guild Trait Confidence.Ranking
#> ASV7 "Wood Saprotroph-Undefined Saprotroph" "NULL" "Probable"
#> ASV8 "Undefined Saprotroph" "White Rot" "Probable"
#> ASV12 "Undefined Saprotroph" "White Rot" "Probable"
#> ASV18 "Undefined Saprotroph" "White Rot" "Probable"
#> ASV25 "Wood Saprotroph" "Brown Rot" "Probable"
#> ASV26 "Undefined Saprotroph" "White Rot" "Probable"
#> Genus_species currentName
#> ASV7 "NA_NA" NA
#> ASV8 "Stereum_ostrea" "Stereum ostrea (Blume & T.Nees) Fr., 1838"
#> ASV12 "Xylodon_raduloides" "Xylodon (Pers.) Gray, 1821"
#> ASV18 "Stereum_ostrea" "Stereum ostrea (Blume & T.Nees) Fr., 1838"
#> ASV25 "Ossicaulis_lachnopus" "Ossicaulis lachnopus (Fr.) Contu, 2000"
#> ASV26 "Stereum_hirsutum" "Stereum hirsutum (Willd.) Pers., 1800"
#> currentCanonicalSimple genusEpithet specificEpithet namePublishedInYear
#> ASV7 NA NA NA NA
#> ASV8 "Stereum ostrea" "Stereum" "ostrea" "1838"
#> ASV12 "Xylodon" "Xylodon" NA "1821"
#> ASV18 "Stereum ostrea" "Stereum" "ostrea" "1838"
#> ASV25 "Ossicaulis lachnopus" "Ossicaulis" "lachnopus" "2000"
#> ASV26 "Stereum hirsutum" "Stereum" "hirsutum" "1800"
#> authorship bracketauthorship scientificNameAuthorship Global_occurences
#> ASV7 NA NA NA NA
#> ASV8 "Fr." "Blume & T.Nees" "(Blume & T.Nees) Fr." " 10908"
#> ASV12 "Gray" "Pers." "(Pers.) Gray" NA
#> ASV18 "Fr." "Blume & T.Nees" "(Blume & T.Nees) Fr." " 10908"
#> ASV25 "Contu" "Fr." "(Fr.) Contu" " 223"
#> ASV26 "Pers." "Willd." "(Willd.) Pers." "121604"
#> taxa_name lang page_length page_views taxon_id
#> ASV7 "NA" NA NA NA NA
#> ASV8 "Stereum ostrea" " 1" " 0" " 0" "Q2710042"
#> ASV12 "Xylodon" NA NA NA NA
#> ASV18 "Stereum ostrea" " 1" " 0" " 0" "Q2710042"
#> ASV25 "Ossicaulis lachnopus" " 1" " 0" " 0" "Q10613125"
#> ASV26 "Stereum hirsutum" " 1" " 0" " 0" "Q557377"
# Alternative: Query specific taxa without a phyloseq object
taxa_info <- tax_gbif_occur_pq(
taxnames = c("Amanita muscaria", "Boletus edulis"),
by_country = TRUE
)
#> ℹ Processing GBIF occurrences for Amanita muscaria
#> ℹ Processing GBIF occurrences for Boletus edulis
# Returns a tibble instead of phyloseq object
head(taxa_info)
#> # A tibble: 2 × 13
#> # Groups: canonicalName [2]
#> canonicalName NL US GB DE CA SE DK RU AT AU
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 Amanita muscaria 129896 30189 26184 24417 9810 9132 8985 8746 7099 6430
#> 2 Boletus edulis 5810 5164 10629 4770 NA 12964 5964 4133 3805 NA
#> # ℹ 2 more variables: NO <int>, CH <int>Data Sources
taxinfo integrates with multiple authoritative data sources:
| Source | Description | Functions |
|---|---|---|
| GBIF | Global biodiversity occurrence data |
tax_gbif_occur_pq(), plot_tax_gbif_pq()
|
| Wikipedia | Encyclopedia data and page statistics |
tax_get_wk_info_pq(), tax_get_wk_pages_info()
|
| GLOBI | Species interaction networks | tax_globi_pq() |
| OpenAlex | Scientific literature database | tax_oa_pq() |
| TAXREF | French national taxonomic reference | tax_info_pq() |
| GNA | Global Names Architecture for name verification | gna_verifier_pq() |
| Custom CSV | Any taxonomic database in CSV format | tax_info_pq() |
Contributing
We welcome contributions! Please open an issue or pull request on GitHub.
Related Packages
taxinfo works seamlessly with:
- MiscMetabar: Miscellaneous functions for metabarcoding analysis
- phyloseq: Analyze microbiome census data
- taxize: Taxonomic information from around the web
- rgbif: Interface to GBIF API
Licence
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
All the members of the DEFIS MITI project, especially Mélanie Roy and Benoît Perez-Lamarque who lead the project.
This project has received financial support from the CNRS (Centre National de la Recherche Scientifique) through the MITI interdisciplinary programs (Project DEFIS - Exploration of Evolutionary Diversity of Fungi and its Indicators through High-Throughput Sequencing : from multi-actors challenges to long-term monitoring).
The developers of the R packages used in this project. A special thanks to Joey McMurdie (
phyloseq), John Waller (rgbif), and Zachary Foster (taxize) for maintaining those useful tools.