Compute accuracy metrics of taxonomic assignation using a mock (known) community for one rank
Source:R/compare_taxo.R
tc_metrics_mock_vec.RdCompute numerous metrics comparing the computed taxonomic assignation to a true assignation.
Note that to compute all metrics, one need to insert fake
taxa (by shuffling sequences and/or by adding external sequences). The user
must fake taxa using functions add_external_seq_pq(),
add_shuffle_seq_pq()) before taxonomic assignation.
Usage
tc_metrics_mock_vec(
physeq,
taxonomic_rank,
true_values,
fake_taxa = TRUE,
fake_pattern = c("^fake_", "^external_"),
verbose = TRUE
)Arguments
- physeq
(required) A
phyloseq-classobject obtained using thephyloseqpackage.- taxonomic_rank
(required) Name (or number) of a taxonomic rank to count.
- true_values
(required) A vector with the true taxonomic assignation
- fake_taxa
(logical, default TRUE) If TRUE, the fake_pattern vector is used to identify fake taxa, i.e. taxa who are not in the reference database (see
add_external_seq_pq()) or taxa with fake sequences (seeadd_shuffle_seq_pq()).- fake_pattern
(character vector, default c("^fake_", "^external_")) A vector of patterns used to identify the fake taxa using a regex search in their name.
- verbose
(logical, default TRUE) If TRUE, print informative messages.
Value
A list of metrics (see the confusion matrix article on wikipedia):
TP (number of true positive)
FP (number of false positive)
FN (number of false negative)
FDR (false discovery rate) = FP / (FP + TP)
TPR (true positive rate, also named recall or sensitivity)
PPV (positive predictive value, also named precision) = TP / (TP + FP)
F1_score (F1 score) = 2 * TP / (2 * TP + FP + FN)
If fake taxa are present and fake_taxa is true, other metrics are computed:
TN (number of true negative)
ACC (Accuracy) = (TP + TN) / (TP + TN + FP + FN)
MCC (Matthews correlation coefficient) = (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (FP + TN) * (TN + FN))