This repository contains the sequencing data analysis software for TAC-seq.
- Linux-based OS
- FASTX-Toolkit
- Git
Use the following commands to setup TAC-seq data analysis software on terminal:
- Install FASTX-Toolkit
- Install Git
- Download the analysis software using Git:
git clone https://github.com/hindrek/TAC-seq-data-analysis - Navigate to analysis location:
cd TAC-seq-data-analysis - Make
tacseqexecutable:chmod +x tacseq
Analyze TAC-seq data.
options:
-hdisplay help and exit
commands:
prepprepare samples (FASTQ files) for countingcountcount reads and molecules per sample and target
Prepare samples (FASTQ files) for counting.
mandatory:
-iinput file: gzip compressed/uncompressed FASTQ file or '-' as standard input (stdin)-ttarget file: target file format is based on FASTX Barcode Splitter barcode file format-ooutput directory
optional:
-hdisplay help and exit-mmismatches: number of allowed mismatches per target sequence (default: 5)
Count reads and molecules per sample and target.
mandatory:
-iinput directory:tacseq prepoutput directory
optional:
-hdisplay help and exit-uUMI threshold (default: 2)
Target file is a text file which contains a list of targets. Each line has to contain a target ID (must be alphanumeric) which is followed by the target sequence (only A, C, G and T characters are allowed). Target ID and sequence are separated by a TAB character.
Target file example:
TARGET1 TAGGATAGGTGGATTCGGGAACTCCCCGATAGTTTTGTCACATCGACATACTAA
TARGET2 CCAAAGCTTCAACGGACATAGTGTACATACCTACCGTGTTTCCCAGCACCTTCC
TARGET3 CTGCTGTTGCCGCCTGGGGTTTACGCGTGTTGGAGATTGAGTAGCCTCCTCGGC
tacseq prep outputs a directory for each sample with:
- 3 sub-directories with files for each target:
- loci
- umis
- merged
- 2 intermediate files:
- trimmed.fasta
- umi_joined.fasta
tacseq count outputs read and molecule counts per target for each sample.
- Step 1 - prepare samples:
./tacseq prep -i example/samples/sample1/sample1.fastq.gz -t example/targets.txt -o example/output/sample1/ -m 5 - Step 2 - count molecules and write results to
counts.tsvfile:./tacseq count -i output/sample1/ -u 2 > counts.tsv