Skip to contents

<a href="https://adrientaudiere.github.io/MiscMetabar/articles/Rules.html#lifecycle"> <img src="https://img.shields.io/badge/lifecycle-experimental-orange" alt="lifecycle-experimental"></a>

A wrapper of [openalexR::oa_fetch()] function to get the number of scientific works (and a list of doi if count_only is set to FALSE) for each taxa of a phyloseq object. Each taxa name is searched in the title and abstract of the works present in Open Alex database.

Usage

tax_oa_pq(
  physeq = NULL,
  taxnames = NULL,
  taxonomic_rank = "currentCanonicalSimple",
  count_only = FALSE,
  return_raw_oa = FALSE,
  add_to_phyloseq = NULL,
  col_prefix = NULL,
  type_works = c("article", "review", "book-chapter", "book", "letter"),
  verbose = TRUE,
  discard_genus_alone = identical(taxonomic_rank, "currentCanonicalSimple"),
  discard_NA = TRUE,
  ...
)

Arguments

physeq

(optional) A phyloseq object. Either `physeq` or `taxnames` must be provided, but not both.

taxnames

(optional) A character vector of taxonomic names.

taxonomic_rank

(Character, default "currentCanonicalSimple") The column(s) present in the @tax_table slot of the phyloseq object. Can be a vector of two columns (e.g. c("Genus", "Species")).

count_only

(Logical, default FALSE) If TRUE, only the number of works on a given taxa is return, leading to a faster call to `openalexR::oa_fetch()`. Note that if count_only is set to TRUE all works (including e.g. preprint and dataset) are count, leading to higher number of works than if count_only is set to FALSE (see parameter `type_works`).

return_raw_oa

(Logical, default FALSE) If TRUE, return the raw list of publications from Open Alex for each taxa as a list of data.frame. Can be useful to filter works for example by topic or by number of citations (see section examples). If TRUE, add_to_phyloseq is set to FALSE automatically.

add_to_phyloseq

(logical, default TRUE when physeq is provided, FALSE when taxnames is provided and FALSE if return_raw_oa is set to TRUE). If TRUE, return a new phyloseq object with new columns in the tax_table slot. Automatically set to TRUE when a phyloseq object is provided and FALSE when taxnames is provided. Cannot be TRUE if `taxnames` is provided.

col_prefix

A character string to be added as a prefix to the new columns names added to the tax_table slot of the phyloseq object (default: NULL).

type_works

(A list of type to select) See Open Alex [documentation](https://docs.openalex.org/api-entities/works/work-object#type). Only used if count_only is set to FALSE Default is c("article", "review", "book-chapter", "book", "letter").

verbose

(logical, default TRUE) If TRUE, prompt some messages.

discard_genus_alone

(logical, default `TRUE` when `taxonomic_rank == "currentCanonicalSimple"`). Passed to [taxonomic_rank_to_taxnames()].

discard_NA

(logical, default `TRUE`). Passed to [taxonomic_rank_to_taxnames()].

...

Other params to passed on [openalexR::oa_fetch()]

Value

Either a tibble (if add_to_phyloseq = FALSE) or a new phyloseq object, if add_to_phyloseq = TRUE, with 1 (`n_doi`) or 4 (`n_doi`, `list_doi`, `n_citation` and `list_keywords` if `count_only` is FALSE) new column(s) in the tax_table.

- n_doi: number of publications citing this taxa in title or abstract - list_doi: list of DOIs separate by ";" - n_citation: total number of citations for all publications citing this taxa - list_keywords: list of keywords from all publications citing this taxa

Details

This function is mainly a wrapper of the work of others. Please cite `openalexR` package.

Author

Adrien Taudiere

Examples

if (FALSE) { # \dontrun{
data_fungi_mini_cleanNames <- gna_verifier_pq(data_fungi_mini) |>
  tax_oa_pq()

ggplot(
  subset_taxa(data_fungi_mini_cleanNames, !is.na(n_doi))@tax_table,
  aes(
    x = log10(as.numeric(n_doi)),
    y = forcats::fct_reorder(currentCanonicalSimple, as.numeric(n_doi))
  )
) +
  geom_point(aes(col = Order)) +
  xlab("Number of Scientific Papers (log10 scale)")

tax_oa_pq(data_fungi_mini_cleanNames, type_works = "dataset")


list_pub_raw <- tax_oa_pq(data_fungi_mini_cleanNames,
  col_prefix = "oa_",
  return_raw_oa = TRUE
)

list_pub_Health_science <- lapply(list_pub_raw, function(xx) {
  if (length(xx) == 0) {
    return(NULL)
  } else {
    filter(xx, map_lgl(topics, function(tibble_item) {
      if (is.null(tibble_item) || nrow(tibble_item) == 0) {
        return(FALSE)
      } else {
        any(grepl("Health science",
          tibble_item$display_name[tibble_item$type == "domain"],
          ignore.case = TRUE
        ))
      }
    }))
  }
})


list_pub_Ecology <- lapply(list_pub_raw, function(xx) {
  if (length(xx) == 0) {
    return(NULL)
  } else {
    filter(xx, map_lgl(topics, function(tibble_item) {
      if (is.null(tibble_item) || nrow(tibble_item) == 0) {
        return(FALSE)
      } else {
        any(grepl("Ecology",
          tibble_item$display_name[tibble_item$type == "subfield"],
          ignore.case = TRUE
        ))
      }
    }))
  }
})

list_pub_at_least_ten_citations <-
  lapply(list_pub_raw, function(xx) {
    if (length(xx) == 0) {
      return(NULL)
    } else {
      filter(xx, cited_by_count > 10)
    }
  })
} # }