Filters sequences from a FASTA database whose header lines match a given pattern. Accepts gzip files. May not work on Windows.
Usage
filter_db(
ref_fasta,
pattern,
output = NULL,
force_two_lines_per_seq = TRUE,
keep_temporary_files = FALSE
)Arguments
- ref_fasta
(Character, required) Path to a FASTA file (plain or gzip).
- pattern
(Character, required) A pattern to search for in sequence headers.
- output
(Character, required) Path to the output FASTA file (must not be gzipped).
- force_two_lines_per_seq
(Logical, default
TRUE) Force the FASTA file to have exactly two lines per sequence (one header, one nucleotide line). IfFALSE, the input must already be in this format.- keep_temporary_files
(Logical, default
FALSE) IfTRUEandforce_two_lines_per_seqisTRUE, keep intermediate temporary files.
Examples
db <- system.file("extdata", "example_unite.fasta", package = "dbpq")
out <- tempfile(fileext = ".fasta")
filter_db(db, "Amanita", output = out)
count_seq_db(out)
#> [1] 2