🧬 Nanopore SV Duplication Detection Pipeline

A complete pipeline for structural variant discovery, focusing on duplications (SVTYPE=DUP), using Oxford Nanopore data. The workflow includes read preprocessing, mapping, SV calling (8 tools), and performance reporting. Designed for large-scale SVs, with optimized parameters for duplications.

🚀 Overview

This pipeline:

Downloads Nanopore reads using SRA tools.
Trims and filters reads for quality.
Aligns reads to a reference genome with minimap2.
Calls structural variants using eight SV callers:
- Sniffles1
- Sniffles2
- cuteSV
- DeBreak
- Dysgu
- SVIM
- pbsv
- SVDSS
Measures performance (time, memory, CPU) for each tool.
Produces VCF files and indexed BAMs.
Supports downstream benchmarking with Truvari.

🛠 Requirements

Unix/Linux system
conda (with conda-forge + bioconda)
Tools (installed in dedicated conda environments):

Task	Tool(s) Used
Download	`prefetch`, `fasterq-dump` (from `sra-tools`)
QC	`NanoPlot`, `NanoFilt`, `Porechop`
Mapping	`minimap2`, `samtools`
SV Calling	`sniffles`, `cuteSV`, `debreak`, `svim`, `pbsv`, `dysgu`, `SVDSS`
Evaluation	`Truvari`, `bcftools`

📁 Inputs

sample.txt: One SRA accession per line.
Reference genome: FASTA format (*.fasta or *.fna).
- Make sure it's indexed if needed (samtools faidx, minimap2 -d).
*.svsig.gz files are reused for pbsv to avoid repeating discover.

🧬 Pipeline Structure

Step 1: Data Download and Preprocessing

Download reads with prefetch, convert with fasterq-dump.
Combine and QC reads (NanoPlot, Porechop, NanoFilt).
Trimming: adapters + low Q-score filtering.

Step 2: Mapping

Align reads using minimap2 -x map-ont.
Convert, sort, index BAM files.
Generate per-base coverage.

Step 3: Structural Variant Calling

The pipeline uses 8 state-of-the-art SV callers, each with unique algorithms and strengths:

Tool	Description
Sniffles1	One of the first SV callers for long reads; identifies SVs using split-read and coverage signals.
Sniffles2	Improved version of Sniffles; supports better genotyping and is optimized for high-throughput data.
cuteSV	Fast and memory-efficient SV caller that clusters signals and refines SV boundaries.
DeBreak	Accurate SV caller using partial order alignment and duplication recovery for noisy long reads.
Dysgu	Versatile SV tool supporting many sequencing types; efficient with nanopore data.
SVIM	Detects a wide range of SVs using a signal-based approach from long-read alignments.
pbsv	PacBio’s official SV caller, works well with aligned reads including ONT; supports split-read analysis.
SVDSS	High-resolution caller using smoothing and suffix-filtering strategies to detect SVs with precision.

All tools support multi-threading and are configured for maximum SV size where applicable.

Step 4: Performance Logging

Each SV caller logs:

Elapsed wall time
CPU usage
Peak memory
User/system time

Results are saved in <SAMPLE>_performance_report.csv.

📤 Output Per Sample

File/Folder	Description
`*_mapped.sort.bam` + `.bai`	Aligned BAM and index
`*_coverage.txt`	Per-base coverage
`*_caller.vcf`	VCF output from each SV tool
`<tool>_<SAMPLE>/`	Tool-specific outputs (e.g., `svim/`, `pbsv/`)
`*_performance_report.csv`	Time, memory, CPU stats per caller
`nanoplot_<SAMPLE>/`	Quality metrics and plots

🔍 Post-processing (optional)

Run truvari to compare output to golden standard.
Filter only SVTYPE=DUP from VCFs using bcftools view -i 'INFO/SVTYPE=="DUP"'.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
readme.md		readme.md
vcpt.py		vcpt.py
vcpt.sh		vcpt.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 Nanopore SV Duplication Detection Pipeline

🚀 Overview

🛠 Requirements

📁 Inputs

🧬 Pipeline Structure

Step 1: Data Download and Preprocessing

Step 2: Mapping

Step 3: Structural Variant Calling

Step 4: Performance Logging

📤 Output Per Sample

🔍 Post-processing (optional)

📚 References

💡 Tips

🧪 Status

About

Uh oh!

Releases

Packages

Languages

VittoNico/Variant_Callers_Performance_Tester

Folders and files

Latest commit

History

Repository files navigation

🧬 Nanopore SV Duplication Detection Pipeline

🚀 Overview

🛠 Requirements

📁 Inputs

🧬 Pipeline Structure

Step 1: Data Download and Preprocessing

Step 2: Mapping

Step 3: Structural Variant Calling

Step 4: Performance Logging

📤 Output Per Sample

🔍 Post-processing (optional)

📚 References

💡 Tips

🧪 Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages