Skip to content

Conversation

@JulianKunkel
Copy link
Contributor

For the sake of discussion, added a phase where multiple benchmarks are executed.
It requires at least 5 procs and executes:
benchmark 0 - 20% procs - parallel write - ior easy
benchmark 1 - 40% procs - parallel rnd1MB read
benchmark 2 - 40% procs - md-workbench for concurrent usage
It does provoke errors in the md-workbench cleanup phase and extra output. These are not score-relevant, though.
The score is computed based on the geo-mean of the individual procs, weighted by proc counts involved in the individual benchmark. It is only included in the extended mode run.

This version is for testing. It executes
    benchmark 0 - 20% procs - parallel write - ior easy
    benchmark 1 - 40% procs - parallel read 1 MB - rnd1MB read
    benchmark 2 - 40% procs - md-workbench for concurrent usage
It does provoke errors in the md-workbench cleanup phase and extra output. These are not score relevant, though.
@JulianKunkel
Copy link
Contributor Author

I believe a subset of the patch is useful as this prepares the parallel execution and anyone can use it for creating a meaningful parallel benchmark from IO500.
Here btw. is an example output with score:

[concurrent]
exe-easy-write = ./ior --dataPacketType=timestamp -C -Q 1 -g -G 1836270349 -k -e -o ./datafiles/ior-easy/ior_file_easy -t 2m -b 9920000m -F -w -D 1 -a POSIX -O saveRankPerformanceDetailsCSV=./results/concurrent-ior-easy-write.csv
exe-rnd1MB-read = ./ior --dataPacketType=timestamp -Q=1 -g -G=-1368305808 -z --random-offset-seed=11 -e -o=./datafiles/ior-rnd1MB/file -O stoneWallingStatusFile=./results/ior-rnd1MB.stonewall -k -t=1048576 -b=1073741824 -s=10000000 -r -R -a POSIX -O saveRankPerformanceDetailsCSV=./results/concurrent-ior-rnd1MB-read.csv
exe-md-workbench = ./md-workbench --dataPacketType=timestamp --process-reports -a POSIX -o=./datafiles/mdworkbench -t=0.000000 -O=1 --run-info-file=./results/mdworkbench.status -D=10 -G=413508310 -P=5027 -I=5027 -R=1 -X -w=1 -o=./datafiles/mdworkbench --run-info-file=./results/mdworkbench.status -2
score-ior-easy-write = 0.688013
score-ior-rnd1MB-read = 7.019916
score-ior-md-workbench = 132.352509
score = 7.996812

Note that it was run with 5 procs, hence, the overall score is calculated with weighting as follows:
((0.6880131 * 7.0199162 * 132.352509 * 2) / 5)^0.33333

Increasing proc counts will first increase for the last benchmark, then the second, then the first.
With 8 procs, one should have 1, 3, 4 procs in the individual benchmarks.

@adilger
Copy link
Contributor

adilger commented May 20, 2022

Isn't md-workbench itself already a concurrent IO workload? I think with the built-in workloads of IOR and mdtest plus a hard stonewall timer it would be possible to generate arbitrary small/large/random read/write + create/stat/find/unlink workloads as needed.

It definitely has some interesting potential as both a stress test and as a way of measuring the over all capabilities of the storage system for more "real world" production workloads where there are dozens of jobs doing uncoordinated IO.

"It requires at least 5 procs and executes: 20% procs - parallel write, 40% procs - parallel rnd1MB read, 40% procs - md-workbench"

This ratio should definitely be configurable, at least during testing, even if we eventually require a specific ratio for submission. There would likely also need to be some coordination between the workloads (i.e. it isn't possible to read from files that haven't been written yet), so there may need to be an unmeasured "warmup" time, or possibly this is still included, but the read workload cannot start until some fraction of the runtime has elapsed (e.g. 25%) to create/write the files.

@JulianKunkel
Copy link
Contributor Author

The purpose is to simulate a "used system" where some nodes run a parallel write, others a random read and users work interactively.

Note that here there is no coordination between workloads in this phase necessary. It simply runs three benchmarks at the same time. All of them use the artifacts created before.
Indeed using MD-workbench removes the need to run even more benchmarks such as mdtest create, delete, ... as well. Only one Metadata benchmark needs to be run now and it synchronizes itself.

We will evaluate the influence of this on an isolated system.
The configurable ratio for benchmarks is a good extension but tricky to assign the ranks.
Maybe one could have to set how many out of 10 procs run a certain benchmark.

Missing features:

  • proper cleanup of metadata (doesn't harm score though)
  • configurable ratio

Copy link
Contributor

@gflofst gflofst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this versus the latest changes. There are merge conflicts now.

@JulianKunkel
Copy link
Contributor Author

Can you check this versus the latest changes. There are merge conflicts now.

Resolved. I still would like these to be in the code, as it is the "extended" version, it doesn't affect ranking or such, but it is great to have for testing!

@gflofst gflofst self-requested a review January 28, 2025 17:31
gflofst
gflofst previously approved these changes Jan 28, 2025
@Obihoernchen
Copy link

I think the concurrent test can fail ior-easy-read afterwards because the ior-easy-write part of concurrent is missing -O stoneWallingWearOut=1 and the written files might not be fully written like this:

...
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000020
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000021
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000022
-rw-r----- 1 bzextbench bzextbench 37094424576 Jul  4 22:04 ior_file_easy.00000023
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000024
-rw-r----- 1 bzextbench bzextbench 38117834752 Jul  4 22:04 ior_file_easy.00000025
-rw-r----- 1 bzextbench bzextbench 38000394240 Jul  4 22:04 ior_file_easy.00000026
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000027
-rw-r----- 1 bzextbench bzextbench 41943040000 Jul  4 22:04 ior_file_easy.00000028
...

Then you will hit returned EOF prematurely.

[RESULT]                concurrent      136.378053 score : time 63.995 seconds
WARNING: read(164, 0x2e45000, 16777216) returned EOF prematurely
ERROR: cannot read from file, (ior.c:1718)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
  Proc: [[59987,1],7]
  Errorcode: -1

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[gcn1222:00000] *** An error occurred in Socket closed
[gcn1222:00000] *** reported by process [3931308033,151]
[gcn1222:00000] *** on a NULL communicator
[gcn1222:00000] *** Unknown error
[gcn1222:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gcn1222:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 7 with PID 0 on node gcn1209 calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

Without concurrent phase everything works as expected.

Updating all necessary code.
@JulianKunkel
Copy link
Contributor Author

Can you try it again? This bug should have been fixed.

@Obihoernchen
Copy link

Thank you! Works very well now 👍

Copy link
Contributor

@adilger adilger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julian, I don't have a major objection to including a new experimental phase into the next benchmark release, but this patch series is very messy. There are many patches which are just bugfixes to the previous patch because it missed a few lines of change, and it contains a significant merge in the middle of the series.

Rather than dragging all of this baggage into the main repository, it would be better to rebase the series to the tip of the io500 repository and squash all of the bugfix patches into the patch that introduced the bug(s). That would leave 3-4 complete patches that implement isolated functionality.

@JulianKunkel
Copy link
Contributor Author

I close this issue now in favor of the new merge request #93.
This squashes changes together into a big bulk and includes a output fix on a clean state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants