-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/refactor DIMS FillMissing #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Add hotfix DIMS/v3.3.1 to main
ALuesink
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General note: check the linter on the files changed.
| } | ||
| } | ||
| peakgroup_list <- cbind(peakgroup_list[, 1:6], ppmdev = ppmdev, peakgroup_list[, 7:ncol(peakgroup_list)]) | ||
| peakgroup_list <- cbind(peakgroup_list[, 1:4], ppmdev = ppmdev, peakgroup_list[, 5:ncol(peakgroup_list)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the indices be changes to column names, for readability and if column orders change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppmdev column is already present in the peak grouplist, so this section is refactored. Referring to columns by number has been removed.
| } | ||
|
|
||
| # replace missing intensities with random values around threshold | ||
| if (!is.null(peakgroup_list)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move if not null statement to main script. Also missing else, what happens if it is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the main script as 'clean' as possible, so I prefer to keep the if statement inside the function.
If peakgroup_list is null, nothing happens; a null object is saved and passed to the next step.
| int_cols <- which(colnames(peakgroup_list) %in% names(repl_pattern)) | ||
| peakgroup_list <- cbind(peakgroup_list, "avg.int" = apply(peakgroup_list[, int_cols], 1, mean)) | ||
|
|
||
| return(peakgroup_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return is within the if-statement, what happens if peakgroup_list is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above; a null object is returned.
| source("../../preprocessing/fill_missing_functions.R") | ||
|
|
||
| # test fill_missing_intensities | ||
| testthat::test_that("missing values are corretly filled with random values", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add function name that is tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function names have not been added in the test_that line for any other unit test; this is a good idea for the general standardization for version 3.5.
| test_peakgroup_list <- data.frame(matrix(NA, nrow = 4, ncol = 23)) | ||
| colnames(test_peakgroup_list) <- c("mzmed.pgrp", "nrsamples", "ppmdev", "assi_HMDB", "all_hmdb_names", | ||
| "iso_HMDB", "HMDB_code", "all_hmdb_ids", "sec_hmdb_ids", "theormz_HMDB", | ||
| "C101.1", "C102.1", "P2.1", "P3.1", | ||
| "avg.int", "assi_noise", "theormz_noise", "avg.ctrls", "sd.ctrls", | ||
| "C101.1_Zscore", "C102.1_Zscore", "P2.1_Zscore", "P3.1_Zscore") | ||
| test_peakgroup_list[, c(1)] <- 300 + runif(4) | ||
| test_peakgroup_list[, c(2, 3)] <- runif(8) | ||
| test_peakgroup_list[, "HMDB_code"] <- c("HMDB1234567", "HMDB1234567_1", "HMDB1234567_2", "HMDB1234567_7") | ||
| test_peakgroup_list[, "all_hmdb_ids"] <- paste(test_peakgroup_list[, "HMDB_code"], | ||
| test_peakgroup_list[, "HMDB_code"], sep = ";") | ||
| test_peakgroup_list[, "all_hmdb_names"] <- paste(test_peakgroup_list[, "assi_HMDB"], | ||
| test_peakgroup_list[, "assi_HMDB"], sep = ";") | ||
| test_peakgroup_list[, grep("C", colnames(test_peakgroup_list))] <- 1000 * (1:16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same code as test_sum_intensities_adducts.R. Maybe make a txt file with the test_peakgroup_list table and load the table in the test_that() function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. test_sum_adducts.R will be modified in version 3.5.
…etics/CustomModules into feature/refactor_DIMS_FilMissing
Refactored FillMissing. 6 functions which were used in the old PeakFinding method have been replaced with 1 function, which has been moved to the preprocessing folder. Unit test was added for this function.
Identification of noise peaks has been removed; functions downstream in the pipeline which use the information on noise peaks have been modified.
The new FillMissing procedure is much simpler than before, while giving better results. All filled-in intensities are now centered around the same threshold value and intensities cannot be negative. The new method is also faster.