Skip to contents

[Maturing]

This function filter and trim (with parameters passed on to dada2::filterAndTrim() function) forward sequences or paired end sequence if 'rev' parameter is set. It return the list of files to subsequent analysis in a targets pipeline.

Usage

filter_trim(
  fw = NULL,
  rev = NULL,
  output_fw = paste(getwd(), "/output/filterAndTrim_fwd", sep = ""),
  output_rev = paste(getwd(), "/output/filterAndTrim_rev", sep = ""),
  ...
)

Arguments

fw

(required) a list of forward fastq files

rev

a list of reverse fastq files for paired end trimming

output_fw

Path to output folder for forward files. By default, this function will create a folder "output/filterAndTrim_fwd" in the current working directory.

output_rev

Path to output folder for reverse files. By default, this function will create a folder "output/filterAndTrim_fwd" in the current working directory.

...

Other parameters passed on to dada2::filterAndTrim() function.

Value

A list of files. If rev is set, will return a list of two lists. The first list is a list of forward files, and the second one is a list of reverse files.

Author

Adrien Taudière

Examples

testFastqs_fw <- c(
  system.file("extdata", "sam1F.fastq.gz", package = "dada2"),
  system.file("extdata", "sam2F.fastq.gz", package = "dada2")
)
testFastqs_rev <- c(
  system.file("extdata", "sam1R.fastq.gz", package = "dada2"),
  system.file("extdata", "sam2R.fastq.gz", package = "dada2")
)

filt_fastq_fw <- filter_trim(testFastqs_fw, output_fw = tempdir())
derep_fw <- derepFastq(filt_fastq_fw[1])
derep_fw
#> $sam1F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 896 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  896 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  4 155 627 265 5 ...
#> 
#> $sam2F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 909 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  909 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  890 14 2 246 797 ...
#> 

filt_fastq_pe <- filter_trim(testFastqs_fw,
  testFastqs_rev,
  output_fw = tempdir("fw"),
  output_rev = tempdir("rev")
)
derep_fw_pe <- derepFastq(filt_fastq_pe[[1]])
derep_rv_pe <- derepFastq(filt_fastq_pe[[2]])
derep_fw_pe
#> $sam1F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 896 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  896 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  4 155 627 265 5 ...
#> 
#> $sam1R.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 1373 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  1373 250
#>   Consensus quality scores: min=7, median=34, max=38
#> $map: Map from reads to unique sequences:  2 1029 664 983 156 ...
#> 
#> $sam2F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 909 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  909 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  890 14 2 246 797 ...
#> 
#> $sam2R.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 1401 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  1401 250
#>   Consensus quality scores: min=7, median=34, max=38
#> $map: Map from reads to unique sequences:  1353 1171 1059 947 293 ...
#> 
derep_rv_pe
#> $sam1F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 896 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  896 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  4 155 627 265 5 ...
#> 
#> $sam1R.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 1373 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  1373 250
#>   Consensus quality scores: min=7, median=34, max=38
#> $map: Map from reads to unique sequences:  2 1029 664 983 156 ...
#> 
#> $sam2F.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 909 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  909 250
#>   Consensus quality scores: min=7, median=36, max=38
#> $map: Map from reads to unique sequences:  890 14 2 246 797 ...
#> 
#> $sam2R.fastq.gz
#> derep-class: R object describing dereplicated sequencing reads
#> $uniques: 1500 reads in 1401 unique sequences
#>   Sequence lengths: min=250, median=250, max=250
#> $quals: Quality matrix dimension:  1401 250
#>   Consensus quality scores: min=7, median=34, max=38
#> $map: Map from reads to unique sequences:  1353 1171 1059 947 293 ...
#>