A complete pipeline for structural variant discovery, focusing on duplications (SVTYPE=DUP), using Oxford Nanopore data. The workflow includes read preprocessing, mapping, SV calling (8 tools), and performance reporting. Designed for large-scale SVs, with optimized parameters for duplications.
This pipeline:
- Downloads Nanopore reads using
SRAtools. - Trims and filters reads for quality.
- Aligns reads to a reference genome with
minimap2. - Calls structural variants using eight SV callers:
- Sniffles1
- Sniffles2
- cuteSV
- DeBreak
- Dysgu
- SVIM
- pbsv
- SVDSS
- Measures performance (time, memory, CPU) for each tool.
- Produces VCF files and indexed BAMs.
- Supports downstream benchmarking with
Truvari.
- Unix/Linux system
conda(withconda-forge+bioconda)- Tools (installed in dedicated conda environments):
| Task | Tool(s) Used |
|---|---|
| Download | prefetch, fasterq-dump (from sra-tools) |
| QC | NanoPlot, NanoFilt, Porechop |
| Mapping | minimap2, samtools |
| SV Calling | sniffles, cuteSV, debreak, svim, pbsv, dysgu, SVDSS |
| Evaluation | Truvari, bcftools |
sample.txt: One SRA accession per line.- Reference genome: FASTA format (
*.fastaor*.fna).- Make sure it's indexed if needed (
samtools faidx,minimap2 -d).
- Make sure it's indexed if needed (
*.svsig.gzfiles are reused for pbsv to avoid repeatingdiscover.
- Download reads with
prefetch, convert withfasterq-dump. - Combine and QC reads (
NanoPlot,Porechop,NanoFilt). - Trimming: adapters + low Q-score filtering.
- Align reads using
minimap2 -x map-ont. - Convert, sort, index BAM files.
- Generate per-base coverage.
The pipeline uses 8 state-of-the-art SV callers, each with unique algorithms and strengths:
| Tool | Description |
|---|---|
| Sniffles1 | One of the first SV callers for long reads; identifies SVs using split-read and coverage signals. |
| Sniffles2 | Improved version of Sniffles; supports better genotyping and is optimized for high-throughput data. |
| cuteSV | Fast and memory-efficient SV caller that clusters signals and refines SV boundaries. |
| DeBreak | Accurate SV caller using partial order alignment and duplication recovery for noisy long reads. |
| Dysgu | Versatile SV tool supporting many sequencing types; efficient with nanopore data. |
| SVIM | Detects a wide range of SVs using a signal-based approach from long-read alignments. |
| pbsv | PacBioβs official SV caller, works well with aligned reads including ONT; supports split-read analysis. |
| SVDSS | High-resolution caller using smoothing and suffix-filtering strategies to detect SVs with precision. |
All tools support multi-threading and are configured for maximum SV size where applicable.
Each SV caller logs:
- Elapsed wall time
- CPU usage
- Peak memory
- User/system time
Results are saved in <SAMPLE>_performance_report.csv.
| File/Folder | Description |
|---|---|
*_mapped.sort.bam + .bai |
Aligned BAM and index |
*_coverage.txt |
Per-base coverage |
*_caller.vcf |
VCF output from each SV tool |
<tool>_<SAMPLE>/ |
Tool-specific outputs (e.g., svim/, pbsv/) |
*_performance_report.csv |
Time, memory, CPU stats per caller |
nanoplot_<SAMPLE>/ |
Quality metrics and plots |
- Run
truvarito compare output to golden standard. - Filter only
SVTYPE=DUPfrom VCFs usingbcftools view -i 'INFO/SVTYPE=="DUP"'.
- Run on an HPC cluster or use
GNU parallelto scale samples. - Ensure your reference genome and read quality are appropriate.
- Tune parameters (e.g.
--max_size) depending on expected SV length.
β
Bash version: fully tested