Manual

# Get help in the command line
nextflow run fmalmeida/ngs-preprocess --help

Tip

All these parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.

Input description

path to fastq files containing sequencing reads
path to Pacbio .bam or .h5 files containing raw data
path containing list of SRA IDs

Watch your input

Users must never use hard or symbolic links. This will make nextflow fail.

Whenever using REGEX for a pattern match, for example "illumina/SRR9847694_{1,2}.fastq.gz" or "illumina/SRR*.fastq.gz", it MUST ALWAYS be inside double quotes.

Remember: the pipeline does not concatenate the reads. Whenever you use a pattern such as * the pipeline will process each read (or pair) that match this pattern separately.

Output options

Parameter	Required	Default	Description
`--output`		NA	Directory to store output files

Max job request

Parameter	Default	Description
`--max_cpus`	4	Max number of threads a job can use across attempts
`--max_memory`	6.GB	Max amount of memory a job can use across attempts
`--max_time`	40.h	Max amount of time a job can take to run

SRA IDs as input

As of version v2.5, users can also select data directly from SRA. One just need to provide a txt file containing SRA run ids, one per line, e.g. Example.

Parameter	Required	Default	Description
`--sra_ids`		NA	Path to txt file containing list of SRA run IDs

Short reads input

Parameter	Default	Description
`--shortreads`	NA	String Pattern to find short reads. Example: "SRR6307304_{1,2}.fastq"
`--shortreads_type`	NA	(single \| paired). Tells whether input is unpaired or paired end
`--fastp_average_quality`	20	Fastp will filter out reads with mean quality less than this
`--fastp_correct_pairs`	false	If set, tells Fastp to try to correct paired end reads. Only works for paired end reads
`--fastp_merge_pairs`	false	If set, tells Fastp to try to merge read pairs
`--fastp_additional_parameters`	false	Pass on any additional parameter to Fastp. The tool's parameters are described in their manual

Long reads input

Parameter	Default	Description
`--lreads_min_length`	500	Length min. threshold for filtering long reads (ONT or Pacbio)
`--lreads_min_quality`	5	Quality min. threshold for filtering long reads (ONT or Pacbio)
`--nanopore_fastq`	NA	Sets path to nanopore fastq files. Pre-processes basecalled long reads
`--nanopore_is_barcoded`	false	Tells whether your data (Nanopore or Pacbio) is barcoded or not. It will split barcodes into single files. Users with legacy pacbio data need to first produce a new barcoded_subreads.bam file
`--nanopore_sequencing_summary`	NA	Path to nanopore 'sequencing_summary.txt'. Using this will make the pipeline render a sequencing statistics report using pycoQC. pycoQC reports will be saved using the files basename, so please, use meaningful basename, such as: sample1.txt, sample2.txt, etc. Preferentially, using the same basename as the fastq
`--pacbio_bam`	NA	Path to Pacbio subreads.bam. Only used if user wants to basecall subreads.bam to FASTQ. Always keep subreads.bam and its relative subreads.bam.pbi files in the same directory
`--pacbio_h5`	NA	Path to directory containing legacy bas.h5 data file (1 per directory). It will be used to extract reads in FASTQ file. All its related files (e.g. bax.h5 files) must be in the same directory
`--pacbio_barcodes`	NA	Path to xml/fasta file containing barcode information. It will split barcodes into single files. Will be used for all pacbio inputs, h5 or bam
`--pacbio_barcode_design`	same	Select the combination of barcodes for demultiplexing. Options: same, different, any
`--pacbio_get_hifi`	false	Whether or not to try to compute CCS reads. Will be used for all pacbio inputs, h5 or bam

All this parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.

Examples

For a better understanding of the usage we provided a feel examples. See some examples.