-
Notifications
You must be signed in to change notification settings - Fork 1
Contur3/Rivet4/Yoda2 upgrade (i.e. the long promised thread safe rivet) #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Still needs testing and possibly ironing out rivet config issues
…only build LHC rivet analyses
…_2_4_1_upgrade_try2
|
This sounds amazing, @tprocter46! Thanks a lot! This is certainly in time for SUSYRun2 postprocessing -- we're just at the stage of starting main scans.
Interesting. Could this maybe be related to the rare segfaults you have seen in SUSYRun2 scans, @ChrisJChang? Probably unrelated, but who knows...
I think merging into SUSYRun2 makes sense. But @ChrisJChang should decide, since he is currently doing all the work with the SUSYRun2 scans.
Would be cool to be able to do this during scans. While postprocessing may technically be faster in terms of the number of CPU hours used, doing everything at once would certainly reduce the human hours spent on annoying things like moving around and combining hdf5 files, correcting datasets, etc. Rather than running Rivet only on a subset of threads, I suspect it would be easier to just say that Rivet is run on the first M events of the total N generated events? |
|
Thanks Tomek, I agree that the SUSYRun2 branch should be the right branch to merge into. This fastjet thing could be related to the rare segfaults, I never proved where they were coming from. In terms of putting into scans, we are already running large scale scans, so wouldn't post-processing still be the go? Also do we have a way of only applying contur to the first million events in a Collider sim, without generating a separate sample? This is interesting, but will probably require some more thought/discussion. |
Yes, for SUSYRun2 it probably should be, since we're allready running scans (and we haven't done any HPC-level testing of this PR.) I was mostly thinking about future studies -- then it would be nice if we can avoid the postprocessing step.
I need to remind myself how the code logic goes -- it may well be more tricky than I thought. But yes, something to be discussed. But probably not worth holding up this PR for that. I'd suggest we rather implement something like that as the next step. |
|
Brilliant to have this in -- thanks @tprocter46 ! As you know, but others might be hazier on, we've now released Rivet 4.1.0 which has more features useful for searches: I wouldn't call this a critial update, but certainly nice to have. I'm hoping, though, that the interface has barely changed for these purposes between 4.0.3 and 4.10, and so that'll be a very straightforward update once this one is merged. |
I did a quick grep and I can't see any analyses in SUSY Run 2 which use the n-subjettiness plugin that caused that issue, so I'm afraid I don't think that's the solution to whatever's happening on lumi |
Yeah, sorry, I was kind of hoping you wouldn't manage to snipe me this week! This has been 95% ready for a month or so, just between starting in Krakow and gambit build problems on my new laptop it took a while to get it over the line. Turned out to be sufficiently painless that I just did it! Contur 3.1.0 might be a bit trickier, but I'll give it a butchers next week (that might have to wait until this PR is done, though) |
|
Contur 3.1.0 appears to break something (independent of thread-safety) in the Contur-Gambit interface, will probably need a follow up with Jon and maybe even a wait until a 3.1.1/3.2.0 release: definitely not going into this PR. |
|
So, with that, I think it should hopefully be ready for review @ChrisJChang |
|
Awesome thanks. I will try to get to this. |
|
Hi Tomek, When trying to build rivet on my laptop, CastXML fails. It seems like it cannot find the hdf5 include file H5Ipublic.h However, I can confirm that gambit includes the directory for this, and that this file is present there. Did you see this at all in your testing? Or did you change anything relating to how BOSS/CastXML would use hdf5? Error message: Running command: /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../castxml/bin/castxml --castxml-gccxml -x c++ --castxml-cc-gnu "(" /usr/bin/c++ -std=c++17 ")" -I/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include -I/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/HepMC3-3.2.5/local/include -I/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/YODA-2.1.0/local/include -I/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/fastjet/3.4.0/local/include -I/usr/include -I/usr/include/eigen3 -I/usr/include /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/AnalysisHandler.hh -o BOSS_temp/Rivet_4_1_0/tempfile_0_AnalysisHandler_hh.xml START CASTXML OUTPUT b'In file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/AnalysisHandler.hh:5:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Config/RivetCommon.hh:18:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/Math.hh:6:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/Vectors.hh:5:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/VectorN.hh:7:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/eigen3/Dense:1:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/eigen3/Core:250:\n/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Math/eigen3/src/Core/arch/Default/GenericPacketMathFunctions.h:623:16: warning: unknown attribute 'optimize' ignored [-Wunknown-attributes]\n__attribute__((optimize("-fno-unsafe-math-optimizations")))\n ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/AnalysisHandler.hh:8:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../Backends/installed/rivet/4.1.0/include/Rivet/Tools/RivetYODA.hh:6:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/YODA-2.1.0/local/include/YODA/AnalysisObject.h:14:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/YODA-2.1.0/local/include/YODA/Utils/H5Utils.h:12:\nIn file included from /home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/YODA-2.1.0/local/include/YODA/highfive/H5File.hpp:14:\n/home/s4358844/GAMBIT/CB_Development/Contur_Rivet_Yoda_Upgrade_PR/gambit/Backends/scripts/BOSS/modules/../../../../contrib/YODA-2.1.0/local/include/YODA/highfive/H5Object.hpp:13:10: fatal error: 'H5Ipublic.h' file not found\n#include <H5Ipublic.h>\n ^~~~~~~~~~~~~\n1 warning and 1 error generated.\n' The initial CastXML command failed. Some problems can be solved by simply specifying Traceback (most recent call last): |
I didn't touch anything between castxml and hdf5 in particular. However, I did use the castxml from |
|
Hi Tomek, The castxml version has nothing to do with the problem (I tested this). I can confirm that rivet seems to build if I go to backends.cmake, and in the rivet entry, change: BOSS_backend(${name} ${ver}) to BOSS_backend(${name} ${ver} "-I${HDF5_INCLUDE_DIR} -I${HDF5_INCLUDE_DIRS}") There are two HDF5 include variables depending on newer and older cmake versions, so I just included them both. I suspect you might have your hdf5 include directory in your main path, and so castxml automatically included this. However, it appears like in this upgrade, the rivet boss step now depends on hdf5, and so we need to manually add it as an extra include to the boss step. I suggest making this change, and testing it does not break your build. I will continue to review and check if it compiles for me. |
|
That does make sense: YODA now has an HDF5 I/O format (and dependency) and Rivet can read HDF5 auxiliary analysis data... so it's a dependency, and one that's (unavoidably) exposed into the API. |
Works on my laptop, have added it. |
|
I've now tested this more thoroughly. After fixing a higgsbounds bug I found while testing, I am happy with this. I will merge it into SUSY run 2. |
Upgrade Rivet (and associated tools) to OpenMP thread safe versions. Also changed the logic of rivet_measurements so that each thread has its own analysis handler running in parallel, and merges the in-memory yodas back together afterwards.
Rivet 4.0.3 also contains quite a significant number of new Run 2 measurements relative to our old Rivet 3.1.5 setup, so should increase contur's sensitivity.
For the latest commit, I've tested O(100k) events in parallel with no problems, segfaults etc. I tested maybe another (200-300k ) across earlier versions. If its available, I wouldn't say no to a truly massive test just to make sure there isn't anything super-rare, but I think we're probably good.
Also upgrades fastjet (upgrade required for Rivet) and fjcontrib (1.048 and below contains a thread-safety issue in the n-subjettiness plugin we discovered while debugging thread safe rivet).
I think a remaining question is which branch I should merge into? I think this branch came off of SUSYRun2 at some point: will this be in time for some SUSYRun2 post-processing @anderkve / @ChrisJChang ? . I can also imagine we might want to use this eventually on the VLQs @ajueid ?
Finally, I think there's probably still some optimisation/though on how we deploy this. If we're just doing post-processing, then the only effect might be a speed up, but in principle could we now put the measurements into scans? But then, for Contur, O(1M+) events is massive overkill, so do we only run rivet on a subset of threads or something like that?
(Also, if this is merged, we can close #412 as it is superceded)