Manual
Tip
All these parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.
Input description
- path to fastq files containing sequencing reads
- path to Pacbio .bam or .h5 files containing raw data
- path containing list of SRA IDs
Watch your input
Users must never use hard or symbolic links. This will make nextflow fail.
Whenever using REGEX for a pattern match, for example "illumina/SRR9847694_{1,2}.fastq.gz" or "illumina/SRR*.fastq.gz", it MUST ALWAYS be inside double quotes.
Remember: the pipeline does not concatenate the reads. Whenever you use a pattern such as * the pipeline will process each read (or pair) that match this pattern separately.
Output options
Parameter |
Required | Default | Description |
---|---|---|---|
--output |
NA | Directory to store output files |
Max job request
Parameter |
Required | Default | Description |
---|---|---|---|
--max_cpus |
4 | Max number of threads a job can use across attempts | |
--max_memory |
6.GB | Max amount of memory a job can use across attempts | |
--max_time |
40.h | Max amount of time a job can take to run |
SRA IDs as input
As of version v2.5, users can also select data directly from SRA. One just need to provide a txt file containing SRA run ids, one per line, e.g. Example.
Parameter |
Required | Default | Description |
---|---|---|---|
--sra_ids |
NA | Path to txt file containing list of SRA run IDs |
Short reads input
Parameter |
Required | Default | Description |
---|---|---|---|
--shortreads |
NA | String Pattern to find short reads. Example: "SRR6307304_{1,2}.fastq" | |
--shortreads_type |
NA | (single | paired). Tells whether input is unpaired or paired end | |
--fastp_average_quality |
20 | Fastp will filter out reads with mean quality less than this | |
--fastp_correct_pairs |
false | If set, tells Fastp to try to correct paired end reads. Only works for paired end reads | |
--fastp_merge_pairs |
false | If set, tells Fastp to try to merge read pairs | |
--fastp_additional_parameters |
false | Pass on any additional parameter to Fastp. The tool's parameters are described in their manual |
Long reads input
Parameter |
Required | Default | Description |
---|---|---|---|
--lreads_min_length |
500 | Length min. threshold for filtering long reads (ONT or Pacbio) | |
--lreads_min_quality |
5 | Quality min. threshold for filtering long reads (ONT or Pacbio) | |
--nanopore_fastq |
NA | Sets path to nanopore fastq files. Pre-processes basecalled long reads | |
--use_porechop_abi |
false | Tells the pipeline to use ab initio version of porechop. Incompatible with --nanopore_is_barcoded . |
|
--nanopore_is_barcoded |
false | Tells whether your data (Nanopore or Pacbio) is barcoded or not. It will split barcodes into single files. Users with legacy pacbio data need to first produce a new barcoded_subreads.bam file | |
--nanopore_sequencing_summary |
NA | Path to nanopore 'sequencing_summary.txt'. Using this will make the pipeline render a sequencing statistics report using pycoQC. pycoQC reports will be saved using the files basename, so please, use meaningful basename, such as: sample1.txt, sample2.txt, etc. Preferentially, using the same basename as the fastq | |
--pacbio_bam |
NA | Path to Pacbio subreads.bam. Only used if user wants to basecall subreads.bam to FASTQ. Always keep subreads.bam and its relative subreads.bam.pbi files in the same directory | |
--pacbio_h5 |
NA | Path to directory containing legacy bas.h5 data file (1 per directory). It will be used to extract reads in FASTQ file. All its related files (e.g. bax.h5 files) must be in the same directory | |
--pacbio_barcodes |
NA | Path to xml/fasta file containing barcode information. It will split barcodes into single files. Will be used for all pacbio inputs, h5 or bam | |
--pacbio_barcode_design |
same | Select the combination of barcodes for demultiplexing. Options: same, different, any | |
--pacbio_get_hifi |
false | Whether or not to try to compute CCS reads. Will be used for all pacbio inputs, h5 or bam |
All this parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.
Examples
For a better understanding of the usage we provided a feel examples. See some examples.