CLI usage Examples¶
Warning
Remember: the pipeline does not concatenate the reads. Whenever you use a pattern such as * with unpaired reads the pipeline will process each read separately.
Illumina paired end reads.¶
This command will select all the read pairs that match the pattern “path-to/SRR*_{1,2}.fastq.gz” and process each pair separately.
./nextflow run fmalmeida/ngs-preprocess \
--max_cpus 3 \
--output illumina_paired \
--shortreads "path-to/SRR*_{1,2}.fastq.gz" \
--shortreads_type "paired" \
--fastp_merge_pairs
Note
Since --shortreads
will always be a pattern match, example “illumina/SRR9847694_{1,2}.fastq.gz”, it MUST ALWAYS be double quoted as the example below.
Note
When using paired end reads it is required that inputs are set with the “{1,2}” pattern. For example: “SRR6307304_{1,2}.fastq”. This will properly load reads “SRR6307304_1.fastq” and “SRR6307304_2.fastq”
Note
--fastp_merge_pairs
triggers the Fastp module to merge read pairs.
Illumina single end reads.¶
This command will select all the reads that match the pattern “path-to/SRR*.fastq.gz” and process each one separately.
./nextflow run fmalmeida/ngs-preprocess \
--max_cpus 3 \
--output illumina_single \
--shortreads "path-to/SRR*.fastq.gz" \
--shortreads_type "single" \
--fastp_additional_parameters " --trim_front1 5 --trim_tail1 5 "
Note
In this example, we pass on an additional parameter (--trim_front1 5 --trim_tail1 5
) to Fastp so it trims the reads using a fixed number of bases from the head and tail of reads.
Note
If multiple unpaired reads are given as input at once, pattern MUST be double quoted: “SRR9696*.fastq.gz”
ONT reads (fastq)¶
This command will select all the reads that match the pattern “path-to/SRR*.fastq.gz” and process each one separately.
./nextflow run fmalmeida/ngs-preprocess \
--max_cpus 3 \
--output ONT \
--nanopore_fastq "path-to/SRR*.fastq.gz" \
--lreads_min_length 1000
Note
The parameter --lreads_min_length
applies a minimum read length threshold to filter the reads.
Pacbio raw (subreads.bam) reads¶
This command will select all the reads that match the pattern “path-to/m140905_*.subreads.bam” and process each one separately.
./nextflow run fmalmeida/ngs-preprocess \
--max_cpus 3 \
--output pacbio_subreads \
--pacbio_bam "path-to/m140905_*.subreads.bam" \
--pacbio_get_hifi \
-with-report
Note
The parameter --pacbio_get_hifi
will make the pipeline try to produce the high fidelity pacbio ccs reads.
Note
-with-report
will generate nextflow execution reports.
Note
If multiple reads are given as input at once, pattern MUST be double quoted: “SRR9696*.fastq.gz”
Pacbio raw (legacy .bas.h5 to subreads.bam) reads¶
./nextflow run fmalmeida/ngs-preprocess \
--pacbio_h5 E01_1/Analysis_Results/ \
--output E01_1/Analysis_Results/preprocessed \
--max_cpus 3
Note
This example refers to the SMRT Cell data files available at: https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly. The path E01_1/Analysis_Results/
is the directory where the legacy *.bas.h5 and *.bax.h5 files are located. The pipeline will load the bas files available in the directory.
Note
Pacbio bas.h5 file and its related bax.h5 files MUST be in the same directory
Running with a nf-core interactive graphical interface¶
./nf-core launch fmalmeida/ngs-preprocess
Running with a configuration file¶
./nextflow run fmalmeida/ngs-preprocess -c nextflow.config