Skip to contents



  physeq = NULL,
  dna_seq = NULL,
  nproc = 1,
  method = "clusterize",
  id = 0.97,
  vsearchpath = "vsearch",
  tax_adjust = 0,
  vsearch_cluster_method = "--cluster_size",
  vsearch_args = "--strand both",
  keep_temporary_files = FALSE,
  swarmpath = "swarm",
  d = 1,
  swarm_args = "--fastidious",
  method_clusterize = "overlap",



(required): a phyloseq-class object obtained using the phyloseq package.


You may directly use a character vector of DNA sequences in place of physeq args. When physeq is set, dna sequences take the value of physeq@refseq


(default: 1) Set to number of cpus/processors to use for the clustering


(default: clusterize) Set the clustering method.

  • clusterize use the DECIPHER::Clusterize() fonction,

  • vsearch use the vsearch software ( with arguments --cluster_size by default (see args vsearch_cluster_method) and -strand both (see args vsearch_args)

  • swarm use the swarm


(default: 0.97) level of identity to cluster


(default: vsearch) path to vsearch


(Default 0) See the man page of merge_taxa_vec() for more details. To conserved the taxonomic rank of the most abundant ASV, set tax_adjust to 0 (default). For the moment only tax_adjust = 0 is robust


(default: "--cluster_size) See other possible methods in the vsearch manual (e.g. --cluster_size or --cluster_smallmem)

  • --cluster_fast : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence length beforehand.

  • --cluster_size : Clusterize the fasta sequences in filename, automatically sort by decreasing sequence abundance beforehand.

  • --cluster_smallmem : Clusterize the fasta sequences in filename without automatically modifying their order beforehand. Sequence are expected to be sorted by decreasing sequence length, unless --usersort is used. In that case you may set vsearch_args to vsearch_args = "--strand both --usersort"


(default : "--strand both") a one length character element defining other parameters to passed on to vsearch.


(logical, default: FALSE) Do we keep temporary files

  • temp.fasta (refseq in fasta or dna_seq sequences)

  • cluster.fasta (centroid if method = "vsearch")

  • temp.uc (clusters if method = "vsearch")


(default: swarm) path to swarm


(default: 1) maximum number of differences allowed between two amplicons, meaning that two amplicons will be grouped if they have d (or less) differences


(default : "--fastidious") a one length character element defining other parameters to passed on to swarm See other possible methods in the SWARM pdf manual


(default "overlap") the method for the DECIPHER::Clusterize() method


Others arguments passed on to DECIPHER::Clusterize()


A new object of class physeq or a list of cluster if dna_seq args was used.


This function use the merge_taxa_vec function to merge taxa into clusters. By default tax_adjust = 0. See the man page of merge_taxa_vec().


VSEARCH can be downloaded from More information in the associated publication


Adrien Taudière


#> Partitioning sequences by 3-mer similarity:
#> ================================================================================
#> Time difference of 0.02 secs
#> Sorting by relatedness within 23 groups:
iteration 1 of up to 10 (100.0% stability) 
#> Time difference of 0.01 secs
#> Clustering sequences by 8-mer similarity:
#> Warning: object 'temp' not found
#> ================================================================================
#> Time difference of 0.06 secs
#> Clusters via relatedness sorting: 100% (0% exclusively)
#> Clusters via rare 3-mers: 100% (0% exclusively)
#> Estimated clustering effectiveness: 100%
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 32 taxa and 137 samples ]
#> sample_data() Sample Data:       [ 137 samples by 7 sample variables ]
#> tax_table()   Taxonomy Table:    [ 32 taxa by 12 taxonomic ranks ]
#> refseq()      DNAStringSet:      [ 32 reference sequences ]
# \donttest{
asv2otu(data_fungi_mini, method_clusterize = "longest")
#> Partitioning sequences by 3-mer similarity:
#> ================================================================================
#> Time difference of 0.01 secs
#> Sorting by relatedness within 23 groups:
iteration 1 of up to 10 (100.0% stability) 
#> Time difference of 0.01 secs
#> Clustering sequences by 8-mer similarity:
#> Warning: object 'temp' not found
#> ================================================================================
#> Time difference of 0.06 secs
#> Clusters via relatedness sorting: 100% (0% exclusively)
#> Clusters via rare 3-mers: 100% (0% exclusively)
#> Estimated clustering effectiveness: 100%
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 32 taxa and 137 samples ]
#> sample_data() Sample Data:       [ 137 samples by 7 sample variables ]
#> tax_table()   Taxonomy Table:    [ 32 taxa by 12 taxonomic ranks ]
#> refseq()      DNAStringSet:      [ 32 reference sequences ]

if (MiscMetabar::is_swarm_installed()) {
  d_swarm <- asv2otu(data_fungi_mini, method = "swarm")
if (MiscMetabar::is_vsearch_installed()) {
  d_vs <- asv2otu(data_fungi_mini, method = "vsearch")
# }