Code profiling #34

sjavis · 2025-06-09T14:48:22Z

I have added the pytest-profiling dependancy and a profiling test. This can be run by calling pytest --profile-svg -m "profiling". This creates a call chart showing bottlenecks in 'prof/combined.svg'. It is also possible to visualise the results as a flamegraph by calling flameprof prof/combined.prof > prof/flamegraph.svg. Closes #22

The result of the profiling is shown below. The majority (90%) of the time is spent in the uptake function, most of which is spent in pandas indexing.

Mikolaj-A-Kowalski · 2025-06-17T12:42:47Z

The result of the profiling is shown below. The majority (90%) of the time is spent in the uptake function, most of which is spent in pandas indexing.

Had some time to dig a little bit deeper into that perhaps and it would appear that the reason and the main performance hurdles are due to temporary allocations caused by indexing operations in pandas.

What I did was to use the memray memory profiler for python and applied it to a "grassland" run with end_time on runtime.txt decreased to 60 (to save on turnover time). Using the memray summary --temporary-allocations feature we are able to inspect the number of short-lived allocations and their cumulative size decomposed per functions:

For context the peak memory usage was only ~170 MB.
We can see that the size of allocated memory seems to follow the profile quite nicely hence I would risk a guess that the temporaries are a main culprit for any performance bottlenecks. This would also explain high runtime variance from #36

EDIT:
So it turns out that the pandas was actually telling us the majority of the problem all along with the FutureWarning ("Setting an item of incompatible dtype...") [at least in version2.3.0 ]... :-/

There is a conversion taking place (float64 -> float32 I believe) in this line.

Switching here:

-         Max_Uptake_array = np.zeros((self.n_monomers*self.gridsize,self.n_taxa), dtype='float32')
+         Max_Uptake_array = np.zeros((self.n_monomers*self.gridsize,self.n_taxa), dtype='float64')

removes it an with it majority of temporary copies. On my machine the runtime of the test case improved from ~870s to ~110s.

sjavis added 3 commits June 6, 2025 18:07

Added pytest-profiling dep. Created unit test folder

0ed8d4d

Add profiling test

14b5af5

Prevent dementpy from running automatically on import

10731c0

Mikolaj-A-Kowalski mentioned this pull request Jun 16, 2025

Script to track performance regression #36

Draft

Mikolaj-A-Kowalski mentioned this pull request Jun 24, 2025

Remove temporary allocations #39

Merged

jgwalkup merged commit 10731c0 into main Nov 12, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Code profiling #34

Code profiling #34

Uh oh!

sjavis commented Jun 9, 2025 •

edited

Loading

Uh oh!

Mikolaj-A-Kowalski commented Jun 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Code profiling #34

Code profiling #34

Uh oh!

Conversation

sjavis commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mikolaj-A-Kowalski commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sjavis commented Jun 9, 2025 •

edited

Loading

Mikolaj-A-Kowalski commented Jun 17, 2025 •

edited

Loading