Output files

Here, using the results produced in the Non-bacterial dataset section, we give users a glimpse over the main outputs produced by ngs-preprocess. The command used in the quickstart wrote the results under the preprocessed_reads directory.

Note

Please take note that the pipeline uses the directory set with the --output parameter as a storage place in which it will create a folder for the final pre-processed reads and for the intermediate files, separated by sequencing technology.

Directory tree

After a successful execution, you will have something like this:

# Directory tree from the running dir
preprocessed_reads
# directory containing the final results of the data cleaning
├── final_output                   
│   └── nanopore
│       └── SRR23337893.filtered.fq.gz
# a template input ready for MpGAP
├── mpgap_samplesheet.yml
# directory containing the nextflow execution reports
├── pipeline_info
│   ├── ngs_preprocess_report_2023-11-18_10-07-36.html
│   ├── ngs_preprocess_timeline_2023-11-18_10-07-36.html
│   ├── ngs_preprocess_tracing_2023-11-18_10-07-36.txt
# directory containing the intermediate files produced by the tools used during pre-processing, and, QC
├── preprocessing_outputs
│   └── nanopore
│       ├── porechop
│       └── QC
# directory containing the intermediate files when downloading data from SRA
└── SRA_FETCH
    ├── FASTQ
    │   └── SRR23337893_data
    └── SRR23337893_sra_runInfo.csv

The pre-formatted MpGAP input samplesheet

Once finished, the pipeline also generates a file called mpgap_samplesheet.yml (showed below). Basically this samplesheet defines all the minimum definitions in order to assemble these reads using the MpGAP pipeline.

samplesheet:
  - id: SRR23337893
    nanopore: /workspace/ngs-preprocess/testing/preprocessed_reads/final_output/nanopore/SRR23337893.filtered.fq.gz

Note

One must keep in mind that, this template samplesheet contains only the bare minimum to launch MpGAP but many other customizations are possible. For example, the generated samplesheet will assemble each read separately, but, MpGAP can also perform hybrid assemblies. Therefore, users can/must use this output as a template for easily customization of the assembly pipeline input to use the results of ngs-preprocess pipeline.

For more information, please refer to the MpGAP documentation.

Example of QC outputs

Here I am going to display just a very few examples of results produced, focusing on the QC, as the main result is a cleaned FASTQ file.

Length versus Quality Scatterplot

NanoPlot Report HTML

Open it here.

NanoStats Report TXT

Open it here.