Phasing Pipeline

A Nextflow pipeline for phasing unphased genotype data using Beagle with reference panels from the 1000 Genomes Project.

Overview

This pipeline performs haplotype phasing of unphased VCF files using Beagle, a powerful tool for phasing and imputation. The pipeline includes preprocessing steps to prepare both the unphased data and reference panels, performs chromosome-by-chromosome phasing, and postprocesses the results.

Features

Preprocessing: Quality control and preparation of unphased VCF files
Reference Panel Processing: Indexing and preparation of 1000 Genomes reference panels
Phasing: Chromosome-by-chromosome phasing using Beagle
Postprocessing: Concatenation and final processing of phased results
Containerized: Uses Singularity containers for reproducibility
Scalable: Supports SLURM cluster execution

Requirements

Nextflow (>= 22.04.0)
Singularity (for containerized execution)
SLURM (for cluster execution, optional)

Quick Start

Clone the repository:
```
git clone <repository-url>
cd Phasing
```

Configure parameters: Edit params/params_beagle.yml with your input files:

vcf_unphased: ./data/your_unphased.vcf.gz
refcsv: ./params/files_beagle.csv
outdir: ./results/
output_prefix: phased_output

Run the pipeline:

# Local execution
nextflow run main.nf -params-file params/params_beagle.yml -profile local

# SLURM cluster execution
nextflow run main.nf -params-file params/params_beagle.yml -profile kutral

Input Files

Required

Unphased VCF: A VCF/BCF file containing unphased genotypes (must be indexed)
Reference CSV: A CSV file with columns:
- chr: Chromosome identifier
- ref_vcf: Path to reference VCF file for that chromosome
- ref_vcf_index: Path to reference VCF index file
- gmap: Path to genetic map file

Example Reference CSV (`files_beagle.csv`):

chr,ref_vcf,ref_vcf_index,gmap
1,/path/to/chr1_ref.vcf.gz,/path/to/chr1_ref.vcf.gz.csi,/path/to/chr1.gmap
2,/path/to/chr2_ref.vcf.gz,/path/to/chr2_ref.vcf.gz.csi,/path/to/chr2.gmap
...

Pipeline Workflow

Preprocessing:
- Index VCF files
- Fill AC (allele count) annotations
- Remove duplicate variants
- Remove missing genotypes
- Prepare reference panels
Phasing:
- Extract chromosome-specific regions
- Run Beagle phasing with reference panels
- Index phased output
Postprocessing:
- Concatenate chromosome-specific results
- Generate final phased VCF

Output

The pipeline generates:

Phased VCF files per chromosome: phased_<chr>.vcf.gz
Concatenated phased VCF: <output_prefix>.vcf.gz
Pipeline execution reports in pipeline_info/

Configuration

Profiles

local: For local execution with Singularity
kutral: For SLURM cluster execution on the ngen-ko queue

Resource Requirements

Default resource allocation:

Memory: 120GB per process
CPUs: 16 per process

Adjust in nextflow.config if needed.

Parameters

Key parameters (set in params/params_beagle.yml):

Parameter	Description	Default
`vcf_unphased`	Path to unphased VCF file	-
`refcsv`	Path to reference CSV file	-
`outdir`	Output directory	`./results/`
`output_prefix`	Prefix for output files	-

Tools Used

Beagle: Haplotype phasing and imputation
bcftools: VCF/BCF manipulation and indexing
Nextflow: Workflow orchestration

Citation

If you use this pipeline, please cite:

Beagle: Browning, B. L., & Browning, S. R. (2016). Genotype imputation with millions of reference samples. The American Journal of Human Genetics, 98(1), 116-126.

License

See LICENSE file for details.

Author

Gabriel Cabas

Support

For issues and questions, please open an issue on the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
imgs		imgs
modules		modules
params		params
workflows		workflows
.cursorignore		.cursorignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phasing Pipeline

Overview

Features

Requirements

Quick Start

Input Files

Required

Example Reference CSV (`files_beagle.csv`):

Pipeline Workflow

Output

Configuration

Profiles

Resource Requirements

Parameters

Tools Used

Citation

License

Author

Support

About

Uh oh!

Releases 2

Packages

Languages

License

digenoma-lab/Phasing

Folders and files

Latest commit

History

Repository files navigation

Phasing Pipeline

Overview

Features

Requirements

Quick Start

Input Files

Required

Example Reference CSV (files_beagle.csv):

Pipeline Workflow

Output

Configuration

Profiles

Resource Requirements

Parameters

Tools Used

Citation

License

Author

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Example Reference CSV (`files_beagle.csv`):

Packages