-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
Currently, the APARENT dataloader gets the PolyA sites from the transcript GTF annotation:
models/APARENT/veff/dataloader.py
Lines 87 to 111 in ae8cf12
| def get_roi_from_transcript(transcript_start: int, transcript_end: int, is_on_negative_strand: bool) -> (int, int): | |
| """ | |
| Get region-of-interest for APARENT in relation to the 3'UTR of a transcript | |
| :param transcript_start: 0-based start position of the transcript | |
| :param transcript_end: 1-based end position of the transcript | |
| :param is_on_negative_strand: is the gene on the negative strand? | |
| :return: Tuple of (start, end) position for the region of interest | |
| """ | |
| # CSE should be roughly around position 70 of the 205bp sequence. | |
| # Since CSE is likely 30bp upstream of the cut site, we shift the cut site | |
| # by 100bp upstream and 105bp downstream | |
| if is_on_negative_strand: | |
| end = transcript_start + 100 | |
| # convert 0-based to 1-based | |
| end += 1 | |
| start = end - 205 | |
| else: | |
| start = transcript_end - 100 | |
| # convert 1-based to 0-based | |
| start -= 1 | |
| end = start + 205 | |
| return start, end |
@johli Is this a viable implementation of the strategy you explained here?
Do you maybe have some example data (vcf file + scores) against which we could compare the Kipoi predictions?
xref: #342
xref: johli/aparent#8
Metadata
Metadata
Assignees
Labels
No labels