Skip to contents

License: MIT

The taxinfo package provides comprehensive tools for augmenting phyloseq objects with taxonomic-based information from various external data sources. It seamlessly integrates data from GBIF, Wikipedia, GLOBI, OpenAlex, TAXREF, and other databases to enrich your taxonomic analyses.

Overview

taxinfo is designed to work with phyloseq objects and provides functions to:

  • Verify and clean taxonomic names using the Global Names Architecture (GNA)
  • Retrieve occurrence data from GBIF and other biodiversity databases
  • Access taxonomic traits from various databases including FungalTraits
  • Get Wikipedia information including page views, links, and content statistics
  • Fetch scientific literature data from OpenAlex
  • Access interaction data from GLOBI (Global Biotic Interactions)
  • Validate geographic occurrences against ecoregions and biogeographic regions
  • Retrieve taxonomic photos and media information

Installation

You can install the stable version of taxinfo from CRAN:

install.packages("taxinfo")

Or the development version from GitHub:

# Install from GitHub
devtools::install_github("adrientaudiere/taxinfo")

# Or using pak
pak::pkg_install("adrientaudiere/taxinfo")

Key Features

🔍 Data Verification & Quality Control

  • gna_verifier_pq(): Verify and standardize taxonomic names using Global Names Architecture

🌍 Biodiversity Data Integration

📚 Knowledge Base Integration

🗺️ Geographic Analysis

🔬 Advanced Analysis Tools

🎯 Flexible Input Options

Most functions can work with either: - Phyloseq objects: Automatically enriches the tax_table (default behavior) - Taxonomic name vectors: Returns tibbles for standalone queries

Quick Start

Glossary

Following the Darwin core standards, here are some key terms used in taxinfo (camel case naming convention): - scientificName: The full scientific name with authorship and date information if known (e.g., “Stereum ostrea (Blume & T.Nees) Fr., 1838”) - genus: Just the genus part (e.g., “Stereum”) - specificEpithet: Just the species epithet part (e.g., “ostrea”) - namePublishedInYear: The year the name was published (e.g., “1838”)

Other terms come from verifier globalnames (camel case naming convention): - currentCanonicalSimple: The simplified scientific name without authorship (e.g., “Stereum ostrea”). It correspond to concatenation of the genus and specificEpithet fields.

Example Workflow

library(taxinfo)
#> Loading required package: MiscMetabar
#> Loading required package: phyloseq
#> Loading required package: ggplot2
#> Loading required package: dada2
#> Loading required package: Rcpp
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: purrr
library(MiscMetabar)

# Load example data (fungal phyloseq object from MiscMetabar)
data("data_fungi_mini", package = "MiscMetabar")

# Step 1: Verify and clean taxonomic names
data_fungi_clean <- gna_verifier_pq(data_fungi_mini,
  data_sources = 210
)
#> ✔ GNA verification summary:
#> • Total taxa in phyloseq: 45
#> • Taxa submitted for verification: 37
#> • Genus-level only taxa: 2
#> • Total matches found: 25
#> • Synonyms: 4 (including 4 at genus level)
#> • Accepted names: 21 (including 15 at genus level)

# Step 2: Add GBIF occurrence data (add_to_phyloseq defaults to TRUE)
data_with_gbif <- tax_gbif_occur_pq(data_fungi_clean)
#> ℹ Processing GBIF occurrences for Stereum ostrea
#> ℹ Processing GBIF occurrences for Ossicaulis lachnopus
#> ■■■■■■                            17% | ETA:  6s
#> ℹ Processing GBIF occurrences for Stereum hirsutum
#> ■■■■■■                            17% | ETA:  6sℹ Processing GBIF occurrences for Basidiodendron eyrei
#> ■■■■■■                            17% | ETA:  6sℹ Processing GBIF occurrences for Sistotrema oblongisporum
#> ■■■■■■                            17% | ETA:  6s■■■■■■■■■■■                       33% | ETA:  6s
#> ℹ Processing GBIF occurrences for Fomes fomentarius
#> ■■■■■■■■■■■                       33% | ETA:  6sℹ Processing GBIF occurrences for Cerocorticium molare
#> ■■■■■■■■■■■                       33% | ETA:  6sℹ Processing GBIF occurrences for Aporpium canescens
#> ■■■■■■■■■■■                       33% | ETA:  6sℹ Processing GBIF occurrences for Hypochnicium analogum
#> ■■■■■■■■■■■                       33% | ETA:  6sℹ Processing GBIF occurrences for Hyphoderma roseocremeum
#> ■■■■■■■■■■■                       33% | ETA:  6sℹ Processing GBIF occurrences for Hyphoderma setigerum
#> ■■■■■■■■■■■                       33% | ETA:  6s■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3s
#> ℹ Processing GBIF occurrences for Trametes versicolor
#> ■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3sℹ Processing GBIF occurrences for Peniophora versiformis
#> ■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3sℹ Processing GBIF occurrences for Exidia glandulosa
#> ■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3sℹ Processing GBIF occurrences for Peniophorella pubera
#> ■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3sℹ Processing GBIF occurrences for Auricularia mesenterica
#> ■■■■■■■■■■■■■■■■■■■■■             67% | ETA:  3s■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     94% | ETA:  1s
#> ℹ Processing GBIF occurrences for Hericium coralloides
#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     94% | ETA:  1s■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100% | ETA:  0s
#> ℹ Processing GBIF occurrences for Xylodon flaviporus

# Step 3: Add trait information
fungal_traits <- system.file("extdata", "fun_trait_mini.csv", package = "taxinfo")
data_with_traits <- tax_info_pq(data_with_gbif,
  taxonomic_rank = "genusEpithet",
  file_name = fungal_traits,
  csv_taxonomic_rank = "GENUS",
  col_prefix = "ft_",
  sep = ";"
)
#> ✔ Added 18 columns from '/tmp/RtmpVMKfDC/temp_libpath1bc0c32ecb02dd/taxinfo/extdata/fun_trait_mini.csv' with information for 0 taxa in the tax_table slot of the phyloseq object

# Step 4: Add Wikipedia information (add_to_phyloseq defaults to TRUE)
data_final <- tax_get_wk_info_pq(data_with_traits)
#> ℹ Getting taxonomic IDs from Wikidata...
#> ℹ Getting page views from Wikipedia for Stereum ostrea
#> ℹ Getting page views from Wikipedia for Ossicaulis lachnopus
#> ℹ Getting page views from Wikipedia for Stereum hirsutum
#> ℹ Getting page views from Wikipedia for Basidiodendron eyrei
#> ℹ Getting page views from Wikipedia for Sistotrema oblongisporum
#> ℹ Getting page views from Wikipedia for Fomes fomentarius
#> ℹ Getting page views from Wikipedia for Mycena renatii
#> ℹ Getting page views from Wikipedia for Cerocorticium molare
#> ℹ Getting page views from Wikipedia for Aporpium canescens
#> ℹ Getting page views from Wikipedia for Hypochnicium analogum
#> ℹ Getting page views from Wikipedia for Hyphoderma roseocremeum
#> ℹ Getting page views from Wikipedia for Hyphoderma setigerum
#> ℹ Getting page views from Wikipedia for Trametes versicolor
#> ℹ Getting page views from Wikipedia for Peniophora versiformis
#> ℹ Getting page views from Wikipedia for Exidia glandulosa
#> ℹ Getting page views from Wikipedia for Peniophorella pubera
#> ℹ Getting page views from Wikipedia for Auricularia mesenterica
#> ℹ Getting page views from Wikipedia for Hericium coralloides
#> ℹ Getting page views from Wikipedia for Xylodon flaviporus

# View the enriched taxonomic table
head(data_final@tax_table)
#> Taxonomy Table:     [6 taxa by 46 taxonomic ranks]:
#>       ft_GENUS     ft_Source ft_COMMENT.on.genus ft_primary_lifestyle
#> ASV7  "NA"         NA        NA                  NA                  
#> ASV8  "Stereum"    NA        NA                  NA                  
#> ASV12 "Xylodon"    NA        NA                  NA                  
#> ASV18 "Stereum"    NA        NA                  NA                  
#> ASV25 "Ossicaulis" NA        NA                  NA                  
#> ASV26 "Stereum"    NA        NA                  NA                  
#>       ft_Secondary_lifestyle ft_Comment_on_lifestyle_template
#> ASV7  NA                     NA                              
#> ASV8  NA                     NA                              
#> ASV12 NA                     NA                              
#> ASV18 NA                     NA                              
#> ASV25 NA                     NA                              
#> ASV26 NA                     NA                              
#>       ft_Endophytic_interaction_capability_template
#> ASV7  NA                                           
#> ASV8  NA                                           
#> ASV12 NA                                           
#> ASV18 NA                                           
#> ASV25 NA                                           
#> ASV26 NA                                           
#>       ft_Plant_pathogenic_capacity_template ft_Decay_substrate_template
#> ASV7  NA                                    NA                         
#> ASV8  NA                                    NA                         
#> ASV12 NA                                    NA                         
#> ASV18 NA                                    NA                         
#> ASV25 NA                                    NA                         
#> ASV26 NA                                    NA                         
#>       ft_Decay_type_template ft_Aquatic_habitat_template
#> ASV7  NA                     NA                         
#> ASV8  NA                     NA                         
#> ASV12 NA                     NA                         
#> ASV18 NA                     NA                         
#> ASV25 NA                     NA                         
#> ASV26 NA                     NA                         
#>       ft_Animal_biotrophic_capacity_template ft_Specific_hosts
#> ASV7  NA                                     NA               
#> ASV8  NA                                     NA               
#> ASV12 NA                                     NA               
#> ASV18 NA                                     NA               
#> ASV25 NA                                     NA               
#> ASV26 NA                                     NA               
#>       ft_Growth_form_template ft_Fruitbody_type_template
#> ASV7  NA                      NA                        
#> ASV8  NA                      NA                        
#> ASV12 NA                      NA                        
#> ASV18 NA                      NA                        
#> ASV25 NA                      NA                        
#> ASV26 NA                      NA                        
#>       ft_Hymenium_type_template ft_Ectomycorrhiza_exploration_type_template
#> ASV7  NA                        NA                                         
#> ASV8  NA                        NA                                         
#> ASV12 NA                        NA                                         
#> ASV18 NA                        NA                                         
#> ASV25 NA                        NA                                         
#> ASV26 NA                        NA                                         
#>       ft_Ectomycorrhiza_lineage_template ft_primary_photobiont
#> ASV7  NA                                 NA                   
#> ASV8  NA                                 NA                   
#> ASV12 NA                                 NA                   
#> ASV18 NA                                 NA                   
#> ASV25 NA                                 NA                   
#> ASV26 NA                                 NA                   
#>       ft_secondary_photobiont Domain  Phylum          Class           
#> ASV7  NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV8  NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV12 NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV18 NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV25 NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#> ASV26 NA                      "Fungi" "Basidiomycota" "Agaricomycetes"
#>       Order             Family           Genus        Species      Trophic.Mode
#> ASV7  "Russulales"      "Stereaceae"     NA           NA           "Saprotroph"
#> ASV8  "Russulales"      "Stereaceae"     "Stereum"    "ostrea"     "Saprotroph"
#> ASV12 "Hymenochaetales" "Schizoporaceae" "Xylodon"    "raduloides" "Saprotroph"
#> ASV18 "Russulales"      "Stereaceae"     "Stereum"    "ostrea"     "Saprotroph"
#> ASV25 "Agaricales"      "Lyophyllaceae"  "Ossicaulis" "lachnopus"  "Saprotroph"
#> ASV26 "Russulales"      "Stereaceae"     "Stereum"    "hirsutum"   "Saprotroph"
#>       Guild                                  Trait       Confidence.Ranking
#> ASV7  "Wood Saprotroph-Undefined Saprotroph" "NULL"      "Probable"        
#> ASV8  "Undefined Saprotroph"                 "White Rot" "Probable"        
#> ASV12 "Undefined Saprotroph"                 "White Rot" "Probable"        
#> ASV18 "Undefined Saprotroph"                 "White Rot" "Probable"        
#> ASV25 "Wood Saprotroph"                      "Brown Rot" "Probable"        
#> ASV26 "Undefined Saprotroph"                 "White Rot" "Probable"        
#>       Genus_species          currentName                                
#> ASV7  "NA_NA"                NA                                         
#> ASV8  "Stereum_ostrea"       "Stereum ostrea (Blume & T.Nees) Fr., 1838"
#> ASV12 "Xylodon_raduloides"   "Xylodon (Pers.) Gray, 1821"               
#> ASV18 "Stereum_ostrea"       "Stereum ostrea (Blume & T.Nees) Fr., 1838"
#> ASV25 "Ossicaulis_lachnopus" "Ossicaulis lachnopus (Fr.) Contu, 2000"   
#> ASV26 "Stereum_hirsutum"     "Stereum hirsutum (Willd.) Pers., 1800"    
#>       currentCanonicalSimple genusEpithet specificEpithet namePublishedInYear
#> ASV7  NA                     NA           NA              NA                 
#> ASV8  "Stereum ostrea"       "Stereum"    "ostrea"        "1838"             
#> ASV12 "Xylodon"              "Xylodon"    NA              "1821"             
#> ASV18 "Stereum ostrea"       "Stereum"    "ostrea"        "1838"             
#> ASV25 "Ossicaulis lachnopus" "Ossicaulis" "lachnopus"     "2000"             
#> ASV26 "Stereum hirsutum"     "Stereum"    "hirsutum"      "1800"             
#>       authorship bracketauthorship scientificNameAuthorship Global_occurences
#> ASV7  NA         NA                NA                       NA               
#> ASV8  "Fr."      "Blume & T.Nees"  "(Blume & T.Nees) Fr."   " 10908"         
#> ASV12 "Gray"     "Pers."           "(Pers.) Gray"           NA               
#> ASV18 "Fr."      "Blume & T.Nees"  "(Blume & T.Nees) Fr."   " 10908"         
#> ASV25 "Contu"    "Fr."             "(Fr.) Contu"            "   223"         
#> ASV26 "Pers."    "Willd."          "(Willd.) Pers."         "121604"         
#>       taxa_name              lang page_length page_views taxon_id   
#> ASV7  "NA"                   NA   NA          NA         NA         
#> ASV8  "Stereum ostrea"       " 1" " 0"        " 0"       "Q2710042" 
#> ASV12 "Xylodon"              NA   NA          NA         NA         
#> ASV18 "Stereum ostrea"       " 1" " 0"        " 0"       "Q2710042" 
#> ASV25 "Ossicaulis lachnopus" " 1" " 0"        " 0"       "Q10613125"
#> ASV26 "Stereum hirsutum"     " 1" " 0"        " 0"       "Q557377"

# Alternative: Query specific taxa without a phyloseq object
taxa_info <- tax_gbif_occur_pq(
  taxnames = c("Amanita muscaria", "Boletus edulis"),
  by_country = TRUE
)
#> ℹ Processing GBIF occurrences for Amanita muscaria
#> ℹ Processing GBIF occurrences for Boletus edulis

# Returns a tibble instead of phyloseq object
head(taxa_info)
#> # A tibble: 2 × 13
#> # Groups:   canonicalName [2]
#>   canonicalName        NL    US    GB    DE    CA    SE    DK    RU    AT    AU
#>   <chr>             <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 Amanita muscaria 129896 30189 26184 24417  9810  9132  8985  8746  7099  6430
#> 2 Boletus edulis     5810  5164 10629  4770    NA 12964  5964  4133  3805    NA
#> # ℹ 2 more variables: NO <int>, CH <int>

Data Sources

taxinfo integrates with multiple authoritative data sources:

Source Description Functions
GBIF Global biodiversity occurrence data tax_gbif_occur_pq(), plot_tax_gbif_pq()
Wikipedia Encyclopedia data and page statistics tax_get_wk_info_pq(), tax_get_wk_pages_info()
GLOBI Species interaction networks tax_globi_pq()
OpenAlex Scientific literature database tax_oa_pq()
TAXREF French national taxonomic reference tax_info_pq()
GNA Global Names Architecture for name verification gna_verifier_pq()
Custom CSV Any taxonomic database in CSV format tax_info_pq()

Contributing

We welcome contributions! Please open an issue or pull request on GitHub.

Citation

If you use taxinfo in your research, please cite:

citation("taxinfo")

taxinfo works seamlessly with:

  • MiscMetabar: Miscellaneous functions for metabarcoding analysis
  • phyloseq: Analyze microbiome census data
  • taxize: Taxonomic information from around the web
  • rgbif: Interface to GBIF API

Licence

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • All the members of the DEFIS MITI project, especially Mélanie Roy and Benoît Perez-Lamarque who lead the project.

  • This project has received financial support from the CNRS (Centre National de la Recherche Scientifique) through the MITI interdisciplinary programs (Project DEFIS - Exploration of Evolutionary Diversity of Fungi and its Indicators through High-Throughput Sequencing : from multi-actors challenges to long-term monitoring).

  • The developers of the R packages used in this project. A special thanks to Joey McMurdie (phyloseq), John Waller (rgbif), and Zachary Foster (taxize) for maintaining those useful tools.