Feature/refactor DIMS FillMissing #91

mraves2 · 2025-12-09T14:05:45Z

Refactored FillMissing. 6 functions which were used in the old PeakFinding method have been replaced with 1 function, which has been moved to the preprocessing folder. Unit test was added for this function.
Identification of noise peaks has been removed; functions downstream in the pipeline which use the information on noise peaks have been modified.

The new FillMissing procedure is much simpler than before, while giving better results. All filled-in intensities are now centered around the same threshold value and intensities cannot be negative. The new method is also faster.

Add hotfix DIMS/v3.3.1 to main

ALuesink

General note: check the linter on the files changed.

DIMS/FillMissing.R

ALuesink · 2025-12-10T13:59:43Z

DIMS/Utils/calculate_zscores.R

        }
      }
-      peakgroup_list <- cbind(peakgroup_list[, 1:6], ppmdev = ppmdev, peakgroup_list[, 7:ncol(peakgroup_list)])
+      peakgroup_list <- cbind(peakgroup_list[, 1:4], ppmdev = ppmdev, peakgroup_list[, 5:ncol(peakgroup_list)])


Could the indices be changes to column names, for readability and if column orders change in the future?

ppmdev column is already present in the peak grouplist, so this section is refactored. Referring to columns by number has been removed.

DIMS/preprocessing/fill_missing_functions.R

ALuesink · 2025-12-10T14:10:40Z

DIMS/preprocessing/fill_missing_functions.R

+  }
+
+  # replace missing intensities with random values around threshold
+  if (!is.null(peakgroup_list)) {


Maybe move if not null statement to main script. Also missing else, what happens if it is null?

I would like to keep the main script as 'clean' as possible, so I prefer to keep the if statement inside the function.
If peakgroup_list is null, nothing happens; a null object is saved and passed to the next step.

ALuesink · 2025-12-10T14:12:06Z

DIMS/preprocessing/fill_missing_functions.R

+    int_cols <- which(colnames(peakgroup_list) %in% names(repl_pattern))
+    peakgroup_list <- cbind(peakgroup_list, "avg.int" = apply(peakgroup_list[, int_cols], 1, mean))
+
+    return(peakgroup_list)


Return is within the if-statement, what happens if peakgroup_list is null?

See comment above; a null object is returned.

ALuesink · 2025-12-10T14:12:45Z

DIMS/tests/testthat/test_fill_missing.R

+source("../../preprocessing/fill_missing_functions.R")
+
+# test fill_missing_intensities
+testthat::test_that("missing values are corretly filled with random values", {


Add function name that is tested

Function names have not been added in the test_that line for any other unit test; this is a good idea for the general standardization for version 3.5.

ALuesink · 2025-12-10T14:17:17Z

DIMS/tests/testthat/test_fill_missing.R

+  test_peakgroup_list <- data.frame(matrix(NA, nrow = 4, ncol = 23))
+  colnames(test_peakgroup_list) <- c("mzmed.pgrp", "nrsamples", "ppmdev", "assi_HMDB", "all_hmdb_names",
+                                     "iso_HMDB", "HMDB_code", "all_hmdb_ids", "sec_hmdb_ids", "theormz_HMDB",
+                                     "C101.1", "C102.1", "P2.1", "P3.1",
+                                     "avg.int", "assi_noise", "theormz_noise", "avg.ctrls", "sd.ctrls",
+                                     "C101.1_Zscore", "C102.1_Zscore", "P2.1_Zscore", "P3.1_Zscore")
+  test_peakgroup_list[, c(1)] <- 300 + runif(4)
+  test_peakgroup_list[, c(2, 3)] <- runif(8)
+  test_peakgroup_list[, "HMDB_code"] <- c("HMDB1234567", "HMDB1234567_1", "HMDB1234567_2", "HMDB1234567_7")
+  test_peakgroup_list[, "all_hmdb_ids"] <- paste(test_peakgroup_list[, "HMDB_code"],
+                                                 test_peakgroup_list[, "HMDB_code"], sep = ";")
+  test_peakgroup_list[, "all_hmdb_names"] <- paste(test_peakgroup_list[, "assi_HMDB"],
+                                                   test_peakgroup_list[, "assi_HMDB"], sep = ";")
+  test_peakgroup_list[, grep("C", colnames(test_peakgroup_list))] <- 1000 * (1:16)


Same code as test_sum_intensities_adducts.R. Maybe make a txt file with the test_peakgroup_list table and load the table in the test_that() function

Done. test_sum_adducts.R will be modified in version 3.5.

DIMS/tests/testthat/test_fill_missing.R

…etics/CustomModules into feature/refactor_DIMS_FilMissing

mraves2 and others added 6 commits October 13, 2025 11:19

Merge pull request #86 from UMCUGenetics/feature/hotfix_DIMS_3.3.1

4c937de

Add hotfix DIMS/v3.3.1 to main

old FillMissing functions replaced by new ones in preprocessing folder

4b395e1

removed identification of noise peaks

1f77b21

added as.data.frame to avoid error of non-numeric argument

d0aff6c

linting modifications

519c5b7

added unit test for FillMissing

4bbccd8

ALuesink requested changes Dec 10, 2025

View reviewed changes

mraves2 added 4 commits December 22, 2025 13:14

removed unused variables

7ea64f0

Merge branch 'feature/refactor_DIMS_FilMissing' of github.com:UMCUGen…

14ce7e6

…etics/CustomModules into feature/refactor_DIMS_FilMissing

moved CollectFilled functions to preprocessing folder

c386ea3

refactored collect_filled_functions

1ff7e0e

Feature/refactor DIMS FillMissing #91

Are you sure you want to change the base?

Feature/refactor DIMS FillMissing #91

Uh oh!

Conversation

mraves2 commented Dec 9, 2025

Uh oh!

ALuesink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants