-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Feature Request information for the early KSTAR stages
Purpose: Reduce overall computation time needed to run KSTAR to arrive at meaningful results by helping people look at an inspection of their choices for thresholding data.
Pipeline placement: This is a feature that occurs after mapping and someone has an experimental dataset with KSTAR mapped peptides and at least one data: column indicated.
Features/Tools
Aspects that matter to researchers:
- How does changing the method or values of binarization change the total number of sites in each data column?
- How does changing the method or values of binarization change the overlap in evidence between data columns?
Proposal is an interface that allows you to change the parameters and for threshold explore a range of thresholds and shows the impact on evidence (such as a line plot with legend equal to each experiment data column, x-axis the threshold, and y-axis the number of sites) and the overlap (as measured by the Jaccard index and presented as a heatmap).
Vision of Threshold
Threshold would have a minimum value entry, maximum value entry, and a number of samples. It would default to "autorange" and then be modifable by the user (being careful to report issues about empty/drop out ranges selected).
Threshold scale autorange detection feature: I propose that we can seed a threshold range according to identifying the minimum data value in the dataset and with the maximum value defined by the highest value under which if you went higher you would have an empty set for one or more data column.
After autorange detection and initial plots (e.g. sampling 10 points within the autorange), users could focus in on a modified range or increase the resolution (i.e. number of samples between the defined range).
Advanced Features
Like in plotting, it could be helpful for people to be able to rearrange columns in the heatmaps (by clustering or manually).
It could be helpful for people to be able to define within and out groups to report average JI overlap. For example, I might want to choose that data columns A, B, and C are replicates, so I want a threshold that has high JI within that group, but shows distinct information with another group of replicates D, E, and F. People can visualize this, but if we want to mathematically find the ideal threshold that meets some total minimum data and balances the ability to distinguish, then being able to define groups and look at this in/ and out group behavior is a step towards that.
@candace-lei can add some additional detail on this ticket regarding batch effects and how we can help people explore and avoid batch effect issues.