Re-cluster sequences of an object of class physeq or cluster a list of DNA sequences using SWARM
Source: R/vsearch.R
swarm_clustering.RdUsage
swarm_clustering(
physeq = NULL,
dna_seq = NULL,
d = 1,
swarmpath = "swarm",
vsearch_path = "vsearch",
nproc = 1,
fastidious = TRUE,
swarm_args = "",
tax_adjust = 0,
rank_propagation = FALSE,
return_swarm_df = FALSE,
keep_temporary_files = FALSE
)Arguments
- physeq
(required) a
phyloseq-classobject obtained using thephyloseqpackage.- dna_seq
NOT WORKING FOR THE MOMENT You may directly use a character vector of DNA sequences in place of physeq args. When physeq is set, dna sequences take the value of
physeq@refseq- d
(default: 1) maximum number of differences allowed between two amplicons, meaning that two amplicons will be grouped if they have
d(or less) differences- swarmpath
(default: swarm) path to swarm
- vsearch_path
(default: vsearch) path to vsearch, used only if physeq is NULL and dna_seq is provided.
- nproc
(default: 1) Set to number of cpus/processors to use for the clustering
- fastidious
(logical, default TRUE), perform a second clustering pass to reduce the number of small clusters (recommended option by swarm authors). Not that if d is different from 1, fastidious is automatically set to FALSE.
- swarm_args
a one length character element defining other parameters to passed on to swarm (e.g. "–mismatch-penalty 4"). See other possible methods in the SWARM pdf manual
- tax_adjust
(Default 0) See the man page of
merge_taxa_vec()for more details. To conserved the taxonomic rank of the most abundant ASV, set tax_adjust to 0 (default). For the moment only tax_adjust = 0 is robust.- rank_propagation
(logical, default FALSE). Do we propagate the NA value from lower taxonomic rank to upper rank? See the man page of
merge_taxa_vec()for more details.- return_swarm_df
(logical, default FALSE) Do we return the swarm dataframe instead of the phyloseq object ? Default FALSE return a phyloseq object if physeq is provided.
- keep_temporary_files
(logical, default: FALSE) Do we keep temporary files ?
temp.fasta (refseq in fasta or dna_seq sequences)
temp_output (classical output of SWARM)
temp_uclust (clusters output of SWARM)
Value
A new object of class physeq or a list of cluster if dna_seq
args was used or if return_swarm_df was set to TRUE.
Details
This function use the merge_taxa_vec function to
merge taxa into clusters. By default tax_adjust = 0 and rank_propagation = FALSE.
See the man page of merge_taxa_vec().
This function is mainly a wrapper of the work of others. Please cite SWARM.
References
SWARM can be downloaded from https://github.com/torognes/swarm/.
SWARM can be downloaded from https://github.com/torognes/swarm. More information in the associated publications doi:10.1093/bioinformatics/btab493 and doi:10.7717/peerj.593
Examples
summary_plot_pq(data_fungi)
system2("swarm", "-h")
data_fungi_swarm <- swarm_clustering(data_fungi)
summary_plot_pq(data_fungi_swarm)
sequences_ex <- c(
"TACCTATGTTGCCTTGGCGGCTAAACCTACCCGGGATTTGATGGGGCGAATTAATAACGAATTCATTGAATCA",
"TACCTATGTTGCCTTGGCGGCTAAACCTACCCGGGATTTGATGGGGCGAATTACCTGGTAAGGCCCACTT",
"TACCTATGTTGCCTTGGCGGCTAAACCTACCCGGGATTTGATGGGGCGAATTACCTGGTAGAGGTG",
"TACCTATGTTGCCTTGGCGGCTAAACCTACC",
"CGGGATTTGATGGCGAATTACCTGGTATTTTAGCCCACTTACCCGGTACCATGAGGTG",
"GCGGCTAAACCTACCCGGGATTTGATGGCGAATTACCTGG",
"GCGGCTAAACCTACCCGGGATTTGATGGCGAATTACAAAG",
"GCGGCTAAACCTACCCGGGATTTGATGGCGAATTACAAAG",
"GCGGCTAAACCTACCCGGGATTTGATGGCGAATTACAAAG"
)
sequences_ex_swarm <- swarm_clustering(
dna_seq = sequences_ex
)