Skip to content

Conversation

@mraves2
Copy link
Contributor

@mraves2 mraves2 commented Dec 9, 2025

Refactored FillMissing. 6 functions which were used in the old PeakFinding method have been replaced with 1 function, which has been moved to the preprocessing folder. Unit test was added for this function.
Identification of noise peaks has been removed; functions downstream in the pipeline which use the information on noise peaks have been modified.

The new FillMissing procedure is much simpler than before, while giving better results. All filled-in intensities are now centered around the same threshold value and intensities cannot be negative. The new method is also faster.

Copy link
Contributor

@ALuesink ALuesink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General note: check the linter on the files changed.

}
}
peakgroup_list <- cbind(peakgroup_list[, 1:6], ppmdev = ppmdev, peakgroup_list[, 7:ncol(peakgroup_list)])
peakgroup_list <- cbind(peakgroup_list[, 1:4], ppmdev = ppmdev, peakgroup_list[, 5:ncol(peakgroup_list)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the indices be changes to column names, for readability and if column orders change in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppmdev column is already present in the peak grouplist, so this section is refactored. Referring to columns by number has been removed.

}

# replace missing intensities with random values around threshold
if (!is.null(peakgroup_list)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move if not null statement to main script. Also missing else, what happens if it is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the main script as 'clean' as possible, so I prefer to keep the if statement inside the function.
If peakgroup_list is null, nothing happens; a null object is saved and passed to the next step.

int_cols <- which(colnames(peakgroup_list) %in% names(repl_pattern))
peakgroup_list <- cbind(peakgroup_list, "avg.int" = apply(peakgroup_list[, int_cols], 1, mean))

return(peakgroup_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return is within the if-statement, what happens if peakgroup_list is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above; a null object is returned.

source("../../preprocessing/fill_missing_functions.R")

# test fill_missing_intensities
testthat::test_that("missing values are corretly filled with random values", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add function name that is tested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function names have not been added in the test_that line for any other unit test; this is a good idea for the general standardization for version 3.5.

Comment on lines +8 to +21
test_peakgroup_list <- data.frame(matrix(NA, nrow = 4, ncol = 23))
colnames(test_peakgroup_list) <- c("mzmed.pgrp", "nrsamples", "ppmdev", "assi_HMDB", "all_hmdb_names",
"iso_HMDB", "HMDB_code", "all_hmdb_ids", "sec_hmdb_ids", "theormz_HMDB",
"C101.1", "C102.1", "P2.1", "P3.1",
"avg.int", "assi_noise", "theormz_noise", "avg.ctrls", "sd.ctrls",
"C101.1_Zscore", "C102.1_Zscore", "P2.1_Zscore", "P3.1_Zscore")
test_peakgroup_list[, c(1)] <- 300 + runif(4)
test_peakgroup_list[, c(2, 3)] <- runif(8)
test_peakgroup_list[, "HMDB_code"] <- c("HMDB1234567", "HMDB1234567_1", "HMDB1234567_2", "HMDB1234567_7")
test_peakgroup_list[, "all_hmdb_ids"] <- paste(test_peakgroup_list[, "HMDB_code"],
test_peakgroup_list[, "HMDB_code"], sep = ";")
test_peakgroup_list[, "all_hmdb_names"] <- paste(test_peakgroup_list[, "assi_HMDB"],
test_peakgroup_list[, "assi_HMDB"], sep = ";")
test_peakgroup_list[, grep("C", colnames(test_peakgroup_list))] <- 1000 * (1:16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code as test_sum_intensities_adducts.R. Maybe make a txt file with the test_peakgroup_list table and load the table in the test_that() function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. test_sum_adducts.R will be modified in version 3.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants