TEdetectionEvaluation

Tool comparison for detecting differentially expressed individual transposable elements.

This evaluation is based on raw count tables and supports those generated by SalmonTE, SQuIRE, TEtools, Telescope and TEtranscripts.

Detailed information are found here: Locus-specific expression analysis of transposable elements. Project specific programm calls can be found in the Supplemental file 6.

Simulation

The script simulation_polyester.R can be used to simulate a data set, for which the tool polyester is used. You can set different arguments to get a specific data set that you want. The script needs as input a fasta file that contains the sequences of which reads should be simulated all other arguments are optional.

Rscript simulation_polyester.R --fa <fasta> [--dete <percentage>] [--replicates <replicates>] [--setup <single/paired>] [--length <read length] [--output <outdir>]

Arguments	Definition
--fa	fasta file that contains reference sequences
--dete	defines the percentage of elements that are simulated as differentially expressed (default: 5)
--replicates	defines number of replicates per condition (default: 5)
--setup	defines if a single- or paired-end data set is simulated (options: single, paired; default: single)
--length	defines the read length (default: 100)
--output	defines the output directory (default: simulated_data_set)

Run tools

The tools were run with the default settings, however, some adaption were done for SalmonTE, TEtranscripts and TEtools which are explained more in detail in the publication.

Tool	Source	DOI
SalmonTE	https://github.com/LiuzLab/SalmonTE	10.1142/9789813235533_0016
Telescope	https://github.com/mlbendall/telescope	10.1101/398172
TEtranscripts	https://github.com/mhammell-laboratory/TEtranscripts	10.1093/bioinformatics/btv422
SQuIRE	https://github.com/wyang17/SQuIRE	10.1093/nar/gky1301
TEtools	https://github.com/douglasgscofield/TEtools	10.1093/nar/gkw953

Run evaluation

The path of the directories where the results are located, the count tables (in case of SQuIRE the common prefix), and addition files files have to sign in into the dataInfo.csv. The evaluation process needs a reference to compare the results of the tools. These reference has to be stored under Simulation.

Additional Files:

SQuIRE needs a 'dictionary' to translate TE ids. This file can be generated with generateDict.py and will be explained further down.
TEtools needs a file where the order of the fastqs are listed (order of the original TEtools call) without the extension .fastq, e.g.:
```
sample_1
sample_2
sample_3
  .
  .
  .
```

When the data is filled in run Rscript TEdetectEval.R to run the evaluation procedure. Subsequently, by running Rscript figures.R and Rscript tables.R the figures and tables were generated.

Helper Scripts

Generation of SQuIRE dictionary

SQuIRE hast a method to generate a .bed-file where a TE identifier is used which identifies each instance also in the resulting count table. The identifier for each TE is in following format:

However, the TE identifier that that is used in the simulated data set is assembled as following:

For the evaluation it is necessary which simulated TE is detected by SQuIRE so that a dictionary is generated to translate the TE ids. Since chr, start and end are unique for each TE these three values are used to translate the ids and to get a table for the TE ids that belonging together.

This can be done with the helper script generateDict.py, which needs as input the bed-file of SQuIRE and your own.

Generate TE reference library

An .align-file generated by RepeatMasker is needed to generate such library. The helper script can be used to generate the reference library. Besides of the align file the reference genome in fasta format is also needed.

bash alignToFasta.sh <.align-file> <referenceGenome.fa>

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
helper_scripts		helper_scripts
libs		libs
readiator @ ab98ea4		readiator @ ab98ea4
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
TEdetectEval.R		TEdetectEval.R
dataInfo.csv		dataInfo.csv
figures.R		figures.R
general.R		general.R
run.sh		run.sh
sampleTEs.R		sampleTEs.R
simulation_polyester.R		simulation_polyester.R
tables.R		tables.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TEdetectionEvaluation

Simulation

Run tools

Run evaluation

Helper Scripts

Generation of SQuIRE dictionary

Generate TE reference library

About

Uh oh!

Releases

Packages

Languages

Hoffmann-Lab/TEdetectionEvaluation

Folders and files

Latest commit

History

Repository files navigation

TEdetectionEvaluation

Simulation

Run tools

Run evaluation

Helper Scripts

Generation of SQuIRE dictionary

Generate TE reference library

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages