Skip to content

Welcome to ngs-preprocess pipeline documentation

F1000 Paper GitHub release (latest by date including pre-releases) Documentation Nextflow run with conda run with docker run with singularity License Follow on Twitter Zenodo Archive

About

ngs-preprocess is a pipeline designed to provide an easy-to-use framework for preprocessing sequencing reads from Illumina, Pacbio and Oxford Nanopore platforms. It is developed with Nextflow and Docker.

Workflow

The pipeline wraps up the following tools and analyses:

Software Analysis
sra-tools & entrez-direct Interaction with SRA database for fetching fastqs and metadata
fastp Fast all-in-one preprocessing for FastQ files
porechop** ONT reads trimming and demultiplexing
pycoQC ONT reads QC
NanoPack Long reads QC and filter
bax2bam Convert PacBio bax files to bam
bam2fastx Extract reads from PacBio bam files
lima PacBio reads demultiplexing
pacbio ccs Generate PacBio Highly Accurate Single-Molecule Consensus Reads

About porechop

Although discontinued since 2018, porechop is included as a legacy compatibility for old nanopore runs, old sequencing kit libraries and old sequencer versions.

However, the newest versions of MinKNOW is able to output trimmed and demultiplexed fastq data, meaning this step is not required anymore.

Finally, it is also okay to not remove adapters from reads as some assemblers may be aware and even benefit of the sequences.

Quickstart

A quickstart is available so you can quickly get the gist of the pipeline's capabilities.

Usage

The pipeline's common usage is very simple as shown below:

# usual command-line
nextflow run fmalmeida/ngs-preprocess \
    --sra_ids "list_of_sra.txt" \
    --lreads_min_length 750 \
    --output "./preprocessed_data" \
    ...

Quote

Some parameters are required, some are not. Please read the pipeline's manual reference to understand each parameter.

Citation

In order to cite this pipeline, please refer to:

Almeida FMd, Campos TAd and Pappas Jr GJ. Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. F1000Research 2023, 12:1205 (https://doi.org/10.12688/f1000research.139488.1)

Support contact

Whenever a doubt arise feel free to contact me via the github issues.