Skip to content

Hoffmann-Lab/TEdetectionEvaluation

Repository files navigation

TEdetectionEvaluation

Tool comparison for detecting differentially expressed individual transposable elements.

This evaluation is based on raw count tables and supports those generated by SalmonTE, SQuIRE, TEtools, Telescope and TEtranscripts.

Detailed information are found here: Locus-specific expression analysis of transposable elements. Project specific programm calls can be found in the Supplemental file 6.

Simulation

The script simulation_polyester.R can be used to simulate a data set, for which the tool polyester is used. You can set different arguments to get a specific data set that you want. The script needs as input a fasta file that contains the sequences of which reads should be simulated all other arguments are optional.

Rscript simulation_polyester.R --fa <fasta> [--dete <percentage>] [--replicates <replicates>] [--setup <single/paired>] [--length <read length] [--output <outdir>]  
Arguments Definition
--fa fasta file that contains reference sequences
--dete defines the percentage of elements that are simulated as differentially expressed (default: 5)
--replicates defines number of replicates per condition (default: 5)
--setup defines if a single- or paired-end data set is simulated (options: single, paired; default: single)
--length defines the read length (default: 100)
--output defines the output directory (default: simulated_data_set)

Run tools

The tools were run with the default settings, however, some adaption were done for SalmonTE, TEtranscripts and TEtools which are explained more in detail in the publication.

Tool Source DOI
SalmonTE https://github.com/LiuzLab/SalmonTE 10.1142/9789813235533_0016
Telescope https://github.com/mlbendall/telescope 10.1101/398172
TEtranscripts https://github.com/mhammell-laboratory/TEtranscripts 10.1093/bioinformatics/btv422
SQuIRE https://github.com/wyang17/SQuIRE 10.1093/nar/gky1301
TEtools https://github.com/douglasgscofield/TEtools 10.1093/nar/gkw953

Run evaluation

The path of the directories where the results are located, the count tables (in case of SQuIRE the common prefix), and addition files files have to sign in into the dataInfo.csv. The evaluation process needs a reference to compare the results of the tools. These reference has to be stored under Simulation.

Additional Files:

  • SQuIRE needs a 'dictionary' to translate TE ids. This file can be generated with generateDict.py and will be explained further down.

  • TEtools needs a file where the order of the fastqs are listed (order of the original TEtools call) without the extension .fastq, e.g.:

    sample_1
    sample_2
    sample_3
      .
      .
      .
    

When the data is filled in run Rscript TEdetectEval.R to run the evaluation procedure. Subsequently, by running Rscript figures.R and Rscript tables.R the figures and tables were generated.

Helper Scripts

Generation of SQuIRE dictionary

SQuIRE hast a method to generate a .bed-file where a TE identifier is used which identifies each instance also in the resulting count table. The identifier for each TE is in following format:

chr|start|end|TE-subfamily:TE-family:TE-repclass|score|strand

However, the TE identifier that that is used in the simulated data set is assembled as following:

chr|start|end|TE-repclass|TE-family|TE-subfamily|score|Kimura distance

For the evaluation it is necessary which simulated TE is detected by SQuIRE so that a dictionary is generated to translate the TE ids. Since chr, start and end are unique for each TE these three values are used to translate the ids and to get a table for the TE ids that belonging together.

This can be done with the helper script generateDict.py, which needs as input the bed-file of SQuIRE and your own.

Generate TE reference library

An .align-file generated by RepeatMasker is needed to generate such library. The helper script can be used to generate the reference library. Besides of the align file the reference genome in fasta format is also needed.

bash alignToFasta.sh <.align-file> <referenceGenome.fa>

About

Tool comparison for detecting differentially expressed individual transposable elements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published