eTAPE: endometrial Tissue-AdaPtive autoEncoder for accurate deconvolution and gene expression analysis

This model is able to accurately deconvolve bulk RNA-seq data into cell fractions and predict cell-type-specific gene expression at cell-type level based on scRNA-seq data, with a specific focus on endometrial tissue and predicting the cell development time per cell type.

This repository contains the code for eTAPE, a modified version of the TAPE model. Highly experimental, proceed with caution.

NB: As of 12.2025 this is not developed anymore due to insufficient predictive power of the TAPE approach for dual task learning of time-series prediction and deconvolution.

Setup

eTAPE uses PyTorch as its Deep-learning framework, so the suitable version of PyTorch will accelerate the model training process. We recommend users to install PyTorch(>=1.8.0) with the right compute platform (CUDA, CPU or ROCm) from its official website in advance.

Usage

Required Files:

single-cell reference: txt format, indices are cell types, columns are gene names
bulk data: tabular format, needed to specify the seperation ('\t',','or others), indices are sample names, columns are gene names
gene length file: used to scale the expression value, columns should contain: [Gene name, Transcript start (bp), Transcript end (bp)]. This is provided in ./data/ directory.

Warning: single-cell reference and bulk samples should contain the same cell types

# basic example
from eTAPE import Deconvolution
SignatureMatrix, CellFractionPrediction = \
    Deconvolution(sc_ref, bulkdata, sep='\t', scaler='mms',
                  datatype='counts', genelenfile='./GeneLength.txt',
                  mode='overall', adaptive=True, variance_threshold=0.98,
                  save_model_name=None,
                  batch_size=128, epochs=128, seed=1)

parameters:

scaler: use 'mms' or 'ss' scaler to preprocess datasets, 'mms' stands for min-max scaler, 'ss' stands for standard scaler. In the paper, all datasets were tested using 'mms'.
datatype: use 'counts'. Users can choose different normalization method based on your single-cell seq technique, if single-cell data is from 10X Genomics, users should use 'counts' to maintain a resonable procedure. The explanation could be found from the webpage.
mode: 'overall' or 'high-resolution'. If you need signature matrix for each sample, use 'high-resolution' mode.
adaptive: True or False. If this is False, then it would not predict signature matrix, the return will be None
variance_threshold: Float number from 0 to 1, it means how many genes you want to keep (in proportion) according to variance from high to low.
batch_size: int, related to training result. 32-128 are recommended. Smaller batch_size leads to more time consumption.
epochs: int, related to training result. Typically, 5000-10000 iterations are enough for TAPE, the relation is $epochs=\frac{iteration \times batch_size}{sampleing_num}$
seed: now, eTAPE supports pinning the random seed to make results being reproducible.

Example

An example is placed in the Experiments directory. Please run the example to get familiar with eTAPE.

Issues

If you find any bugs or have problems when you are using eTAPE, feel free to raise issues.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
eTAPE		eTAPE
experiments		experiments
.gitignore		.gitignore
README.md		README.md
etape.yml		etape.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

eTAPE: endometrial Tissue-AdaPtive autoEncoder for accurate deconvolution and gene expression analysis

Setup

Usage

Example

Issues

About

Uh oh!

Releases

Packages

Languages

allumik/eTAPE

Folders and files

Latest commit

History

Repository files navigation

eTAPE: endometrial Tissue-AdaPtive autoEncoder for accurate deconvolution and gene expression analysis

Setup

Usage

Example

Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages