-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I'm working with TCGA RNAseq data (aligned to hg38) and realized there is no annotation for CCL3L1 gene in raw counts data from GDC repository.
The corresponding ensembl id for CCL3L1 is ENSG00000277796, which is not available in TCGA data. However, I can find the ENSG00000276085 gene, which correspond to CCL3L3 gene.
Looking at Ensemble website I realized the CCL3L1 and CCL3L3 correspond to similar location at the genome (http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000277796;r=CHR_HSCHR17_10_CTG4:36194906-36196795;t=ENST00000612067 and http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000276085;r=17:36194869-36196758).
My question is, may I use the CCL3L3 count number as the CCL3L1 gene counts to calculate the IPS?
Additionally, the biomaRt package for R and the annotated gtf file from TCGA (https://api.gdc.cancer.gov/data/fe1750e4-fc2d-4a2c-ba21-5fc969a24f27) use the CAVIN2 as the alias for SDPR gene. So, this is a source of error running IPS with TCGA hg38 data.
Looking forward to hear from you
Cheers!
daniel