-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Describe the bug
There is a weird quirk of the nextflow implementation of KSTAR, where data columns are removed if none of the values are greater than 1. Then, this causes an error downstream when trying to binarize the dataset but the data column is no longer there (and it is not noted by these functions). These lead to a silent error (not obvious what is causing the issue) during the binarize experiment step where it can't find the data column needed.
I have traced this error back to Line 91-92 of nextflow/src/helpers.py (check_data_columns.py function)
def check_data_columns(evidence, data_columns, logger=None):
"""
Checks data columns to make sure column is in evidence and that evidence filtered on that data column
has at least one point of evidence. Removes all columns that do not meet criteria
"""
if logger is None:
logger = get_logger("check_data", "check_data.log")
if logger is None:
lof
new_data_columns = []
for col in data_columns:
if col in evidence.columns:
if len(evidence[evidence[col] >= 1]) > 0: #################### issue is here##################
new_data_columns.append(col)
else:
logger.warning(f"{col} does not have any evidence")
else:
logger.warning(f"{col} not in evidence")
data_columns = new_data_columns
return data_columnsTo fix, should just need to add threshold parameter to the function, change the >=1 to >=threshold, and then make sure main() inputs in the correct threshold.
However, this will still cause an error if there are data columns that are removed due to lacking evidence according to the threshold. To fix this, we will also want to update the binarize experiment.py file in the nextflow implementation to actual use the updated data columns. In Line97 of the main() function, change binary_evidence = create_binary_evidence(evidence, results.data_columns, results.activity_agg, results.threshold, greater) to binary_evidence = create_binary_evidence(evidence, data_columns, results.activity_agg, results.threshold, greater)
To Reproduce
Steps to reproduce the behavior:
- Find a dataset which has quantification less than 1 for all sites
- Map the dataset, and run nextflow implementation as normal