Skip to content

Conversation

@sjavis
Copy link
Contributor

@sjavis sjavis commented Jun 9, 2025

I have added the pytest-profiling dependancy and a profiling test. This can be run by calling pytest --profile-svg -m "profiling". This creates a call chart showing bottlenecks in 'prof/combined.svg'. It is also possible to visualise the results as a flamegraph by calling flameprof prof/combined.prof > prof/flamegraph.svg. Closes #22

The result of the profiling is shown below. The majority (90%) of the time is spent in the uptake function, most of which is spent in pandas indexing.
combined

@Mikolaj-A-Kowalski
Copy link
Contributor

Mikolaj-A-Kowalski commented Jun 17, 2025

The result of the profiling is shown below. The majority (90%) of the time is spent in the uptake function, most of which is spent in pandas indexing.

Had some time to dig a little bit deeper into that perhaps and it would appear that the reason and the main performance hurdles are due to temporary allocations caused by indexing operations in pandas.

What I did was to use the memray memory profiler for python and applied it to a "grassland" run with end_time on runtime.txt decreased to 60 (to save on turnover time). Using the memray summary --temporary-allocations feature we are able to inspect the number of short-lived allocations and their cumulative size decomposed per functions:
allocations

For context the peak memory usage was only ~170 MB.
We can see that the size of allocated memory seems to follow the profile quite nicely hence I would risk a guess that the temporaries are a main culprit for any performance bottlenecks. This would also explain high runtime variance from #36

EDIT:
So it turns out that the pandas was actually telling us the majority of the problem all along with the FutureWarning ("Setting an item of incompatible dtype...") [at least in version2.3.0 ]... :-/

There is a conversion taking place (float64 -> float32 I believe) in this line.

Switching here:

-         Max_Uptake_array = np.zeros((self.n_monomers*self.gridsize,self.n_taxa), dtype='float32')
+         Max_Uptake_array = np.zeros((self.n_monomers*self.gridsize,self.n_taxa), dtype='float64')

removes it an with it majority of temporary copies. On my machine the runtime of the test case improved from ~870s to ~110s.

@jgwalkup jgwalkup merged commit 10731c0 into main Nov 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance profiling

4 participants