Welcome to ngs-preprocess pipeline documentation

About

ngs-preprocess is a pipeline designed to provide an easy-to-use framework for preprocessing sequencing reads from Illumina, Pacbio and Oxford Nanopore platforms. It is developed with Nextflow and Docker.

Workflow

The pipeline wraps up the following tools and analyses:

Software	Analysis
sra-tools & entrez-direct	Interaction with SRA database for fetching fastqs and metadata
fastp	Fast all-in-one preprocessing for FastQ files
porechop**	ONT reads trimming and demultiplexing
porechop ABI**	Ab initio version of porechop
pycoQC	ONT reads QC
NanoPack	Long reads QC and filter
bax2bam	Convert PacBio bax files to bam
bam2fastx	Extract reads from PacBio bam files
lima	PacBio reads demultiplexing
pacbio ccs	Generate PacBio Highly Accurate Single-Molecule Consensus Reads

About porechop

Although discontinued since 2018, porechop is included as a legacy compatibility for old nanopore runs, old sequencing kit libraries and old sequencer versions.

However, the newest versions of MinKNOW is able to output trimmed and demultiplexed fastq data, meaning this step is not required anymore.

Finally, it is also okay to not remove adapters from reads as some assemblers may be aware and even benefit of the sequences.

Quickstart

A quickstart is available so you can quickly get the gist of the pipeline's capabilities.

Usage

The pipeline's common usage is very simple as shown below:

# usual command-line
nextflow run fmalmeida/ngs-preprocess \
    --sra_ids "list_of_sra.txt" \
    --lreads_min_length 750 \
    --output "./preprocessed_data" \
    ...

Quote

Some parameters are required, some are not. Please read the pipeline's manual reference to understand each parameter.

Citation

In order to cite this pipeline, please refer to:

Almeida FMd, Campos TAd and Pappas Jr GJ. Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. F1000Research 2023, 12:1205 (https://doi.org/10.12688/f1000research.139488.1)

Support contact

Whenever a doubt arise feel free to contact me via the github issues.