Skip to content

Conversation

@stephprince
Copy link
Contributor

@stephprince stephprince commented Dec 2, 2025

Add a couple updates to the visualization and data processing modules:

  • add option to filter results by time (older results were using incorrect ophys files for lindi as described in Regenerate the lindi ophys file #161)
  • add figures / quantifications comparing streaming vs. download times (note that these will be much more informative once we have the results from the scaling datasets described in Add datasets to test scaling for ephys, ophys, and icephys #159 and Write scaling tests #162 since we will be able to generate estimates for different file sizes)
  • fix plotting of performance across different package versions (if we want to dig into differences across specific libraries we may need to perform additional testing of changing a single package version while holding other versions constant)
  • replace script with CLI subcommand to generate figures

These updates should now generate figures that cover all the main questions outlined in #89, though some refinement is needed before publication.

To generate the figures locally, run nwb_benchmarks generate_figures. This will use the latest results stored in the ~/.cache/nwb-benchmarks/nwb-benchmarks-results/ folder, or will clone if it does not exist, or you can optionally specify a different results folder to use. Ideally we could add additional functionality to provide user-specific recommendations and figure reports, but I would address that in a separate PR and will leave this with the default LBL Mac results filtering for now.

@CodyCBakerPhD
Copy link
Collaborator

add option to filter results by time (older results were using incorrect ophys files for lindi as described in #161)

While a time filter is great to have in general, specifically solving for that issue is better done through filtering of the database_version field in each result file

@CodyCBakerPhD
Copy link
Collaborator

Ideally we could add additional functionality to provide user-specific recommendations and figure reports, but I would address that in a separate PR and will leave this with the default LBL Mac results filtering for now.

Cool - looking forward to a follow-up that alleviates the hard-coded LBL machine ID and uses the entire database (of a particular database version, of course)

@CodyCBakerPhD
Copy link
Collaborator

New plots and generation framework look good to me

One question that I will leave here (but might belong in a separate issue) - what is with the 3 out of 5 outlier data points for all the S3-based fsspec cases? Is it possible to determine if those were earlier or later runs (or independent of time)? For example, varying the shade of the individual scatter color based on the date/time of the run

image

@CodyCBakerPhD
Copy link
Collaborator

I guess I do have one more question:

What do the connected lines on the performance over time (package versions) mean? What do the different colors represent?

image

Something probably easy to explain in the caption of such a figure

image

Also I assume the vertical line above is to be interpreted as an error bar?

@CodyCBakerPhD
Copy link
Collaborator

More questions:

for slicing_vs_download_time.pdf, should these all be flat? I seem to recall this should (at least conceptually) be a line whose slope eventually intercepts a threshold

which seems to be what slicing_vs_time.pdf is (more interpretable) but all the other 'vs.' plots (local/remote read) are much harder to interpret as-is

@stephprince
Copy link
Contributor Author

Thanks for the feedback @CodyCBakerPhD !

Based on your feedback and some further discussion with @oruebel, I have updated several plots and added captions to help with interpretability.

While a time filter is great to have in general, specifically solving for that issue is better done through filtering of the database_version field in each result file

Good point, I will open up an issue or separate PR fix to add the database_version filtering parameter to the main CLI. I think we may have not updated the database version when we updated the LINDI files, so updating now to filter by the latest database version may exclude some of our results. But I would have to double check.

Cool - looking forward to a follow-up that alleviates the hard-coded LBL machine ID and uses the entire database (of a particular database version, of course)

Yes, planning to address the hardcoded machine ID this week to include the dandihub runs. At the very least I will open up a related issue to use the entire database.

what is with the 3 out of 5 outlier data points for all the S3-based fsspec cases? Is it possible to determine if those were earlier or later runs (or independent of time)? For example, varying the shade of the individual scatter color based on the date/time of the run

These outliers seem to be a systemic issue that we see with the S3-based fsspec cases and is also present in the network tracking results. Now that we have results from older package versions, it could also be a package version issue. I like the idea of varying the shade based on either time of run or version info, I will open up as an issue to address in the future.

What do the connected lines on the performance over time (package versions) mean? What do the different colors represent?

These colors represented the different streaming methods. However after further consideration it doesn't make as much sense to separate out the individual packages into their own subplots, since we did not run tests that specifically controlled for one package version, but rather we ran environments at different timepoints across which multiple package versions could have changed. The new performance_over_time_* plots attempt to more accurately represent this concept.

for slicing_vs_download_time.pdf, should these all be flat? I seem to recall this should (at least conceptually) be a line whose slope eventually intercepts a threshold

The new "slicing_with_extrapolation" plot should match more with the initial conceptualization we had.

which seems to be what slicing_vs_time.pdf is (more interpretable) but all the other 'vs.' plots (local/remote read) are much harder to interpret as-is

I've updated the captions and the plot names to help with interpretability. Let me know how that works or if I can add any additional clarifications.

@stephprince
Copy link
Contributor Author

Latest results figures can be found in the google drive here

@oruebel oruebel enabled auto-merge December 17, 2025 00:01
@oruebel oruebel merged commit d583edc into main Dec 17, 2025
3 checks passed
@oruebel oruebel deleted the update-figures branch December 17, 2025 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants