Skip to content

Conversation

@mraves2
Copy link
Contributor

@mraves2 mraves2 commented Jul 11, 2025

The PeakFinding step in the DIMS pipeline has been refactored. Instead of averaging the intensities for technical replicates and doing PeakFinding on the averages, the new method will do PeakFinding for each technical replicate and then average the peak intensities for every biological sample. To do this, several scripts have been modified:

  • MakeInit: changed variable name 'tmp' to 'replicates_persample'
  • GenerateBreaks: breaks and trim parameters put into separate RData files
  • AssignToBins: trim parameters read in separately, 'sample_name' changed to 'techrep_name', weighted mean for half-bad TICs
  • AverageTechReplicates: averaging part removed, script renamed to EvaluateTics. Update txt file with info on samples and corresponding tech reps
  • PeakFinding: new simplified way to find peaks for every technical replicate. First step: find regions of interest (roi) with some intensity; second step: integrate intensity in each roi by fitting a Gaussian curve
    • preprocessing/peak_finding_functions: new functions for PeakFinding. Note that two functions are borrowed from other R packages which will be included in the Docker image in the future. These two functions may not adhere to our coding standards
  • AveragePeaks: averaging technical replicates after PeakFinding. Information for technical replicates from a txt file with scanmode included.
  • CollectAveraged: collect averaged peaks for all biological samples
  • PeakGrouping: input from CollectAveraged
  • tests/testthat/test_peak_finding_functions: unit tests for PeakFinding funtions. Note: no unit tests have been added for functions from external packages.

@mraves2 mraves2 marked this pull request as draft July 11, 2025 15:40
@mraves2 mraves2 marked this pull request as ready for review July 15, 2025 10:26
Copy link

@fdekievit fdekievit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Veel werk!

Ik heb hier en daar wat comments achter gelaten :)

Copy link

@fdekievit fdekievit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvement!

Left some minor comments :)

Copy link

@fdekievit fdekievit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Een aantal functies kan ik niet volgen omdat ik niet kan zien waar ze vandaan komen (omdat er door sources functies opeens 'bestaan') dus hier heb ik geen comments voor achter gelaten.

Er is een linter toegevoegd, maar zonder de linter warnings toe te passen voegt de linter nu nog niks toe. In de python repo's is volgens mij besloten om error on warning op true te zetten waardoor je geforceerd wordt om deze warnings aan te pakken (anders voegt het natuurlijk nog niet zoveel toe). De linter geeft nu > 500 warnings.

Als laatste wordt er soms over dataframes gelooped zonder te weten of ze gevuld zijn. Als dit geen probleem is, laat dit dan a.u.b. zien door een unittest te runnen op een leeg object b.v.

@mraves2
Copy link
Contributor Author

mraves2 commented Dec 22, 2025 via email

Copy link

@fdekievit fdekievit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants