ANNSeq is a snakemake pipeline that takes Oxford Nanopore Sequencing (ONS) data (fastq) as input, generates fastq stats using nanostat, performs fastq processing and filtering using pychopper, map the reads to the genome using minimap2 and uses talon to assemble and quantify transcripts. Below is the dag of the pipeline:
- ONS fastq reads
- Reference genome assembly in fasta format
- GTF: Gencode GTF; tested on v38 comprehensive CHR gene annotation
- miniconda
- The rest of the dependencies (including snakemake) are installed via conda through the
environment.ymlfile
Clone the directory:
git clone --recursive https://github.com/sid-sethi/ANNSeq.gitCreate conda environment for the pipeline which will install all the dependencies:
cd ANNSeq
conda env create -f environment.ymlEdit config.yml to set up the working directory and input files/directories. snakemake command should be issued from within the pipeline directory. Please note that before you run any of the snakemake commands, make sure to first activate the conda environment using the command conda activate annseq.
cd ANNSeq
conda activate annseq
snakemake --use-conda -j <num_cores> allIt is a good idea to do a dry run (using -n parameter) to view what would be done by the pipeline before executing the pipeline.
snakemake --use-conda -n allYou can visualise the processes to be executed in a DAG:
snakemake --dag | dot -Tpng > dag.pngTo exit a running snakemake pipeline, hit ctrl+c on the terminal. If the pipeline is running in the background, you can send a TERM signal which will stop the scheduling of new jobs and wait for all running jobs to be finished.
killall -TERM snakemakeTo deactivate the conda environment:
conda deactivateworking directory
|--- config.yml # a copy of the parameters used in the pipeline
|--- Nanostat/
|-- # output of nanostat - fastq stats
|--- Pychopper/
|-- # output of pychopper - filtered fastq
|--- Mapping/
|-- # output of minimap2 - aligned reads
|--- Talon/
|-- # output of Talon
|-- _talon.gtf # assembled transcripts
|-- _talon_abundance_filtered.tsv # transcript abundance
