Skip to content

Conversation

@anderkve
Copy link
Collaborator

@anderkve anderkve commented Nov 29, 2023

[Work in progress -- just using the PR to easily run CI tests]

This is a PR to split the CI job into separate subjobs, i.e.

  • one job for building gambit + scanners, and run a spartan.yaml test run
  • one job for building the standalones
  • one job for building backends
  • one job for running some test runs (ColliderBit_CMSSM.yaml and WC.yaml)

This is to avoid the current situation where our single "Build GAMBIT" CI job often fails just because e.g. some backend failed to download.

I'm currently working on the Ubuntu CI job, and then @ChrisJChang will take a look at making similar modifications for the Mac CI jobs.

@anderkve anderkve added Core Core group task WIP work in progress labels Nov 29, 2023
@anderkve anderkve self-assigned this Nov 29, 2023
@anderkve
Copy link
Collaborator Author

OK, I think I'm happy with the changes to the Ubuntu CI jobs now. Hopefully all the Ubuntu CI jobs complete, except maybe the WC.yaml test run, which has been acting up lately. (Hence this PR to more easily isolate such issues).

So when you have time, @ChrisJChang, you can take a look and see what's the best way to propagate a similar job splitting to the Mac CI jobs. The backends_build job in the Ubuntu CI script now uses the new shell script we discussed, build_backends.sh, which is here: https://github.com/GambitBSM/gambit/blob/split_CI_jobs/cmake/scripts/build_backends.sh

I think we have discussed this type of CI jobs splitting in some previous Core meetings, but tagging @tegonzalo, @patscott and @pstoecker in case any of you have opinions/suggestions on this.

@anderkve anderkve added the CI Continuous Integration (tests done via GitHub Actions) label Nov 29, 2023
@ChrisJChang
Copy link
Collaborator

So when you have time, @ChrisJChang, you can take a look and see what's the best way to propagate a similar job splitting to the Mac CI jobs.

Sure. Once the Mac CI job spass (the Mac mini one not yet up and running), I'll be happy for this to merge.

@anderkve
Copy link
Collaborator Author

Sure. Once the Mac CI jobs pass (the Mac mini one not yet up and running), I'll be happy for this to merge.

Thanks for making the changes to the Mac CI job scripts!

Below is a summary of the current status of the Ubuntu and Mac x64 jobs. Looks like we currently have three separate issues on Mac x64 that we don't see on Ubuntu. We should divide the work on trying to sort these out, so that you're not stuck with all of these, @ChrisJChang. (Also tagging @pstoecker: below there's a build issue with classy that I think is partially related to our classy patch -- maybe you have some input on that?)

  • We shouldn't expect the test_runs CI jobs to pass, due to this current issue on master: Ubuntu CI jobs segfaulting during test runs #463 . However, on Mac x64 it currently fails at an earlier stage when trying to build Rivet, due to the problem you mentioned in the office recently:
Making install in rivet
/usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
error: could not delete '///usr/local/lib/python3.10/site-packages/rivet/plotinfo.py': Permission denied

This is an existing problem on the Mac x64 CI job for master.

  • The gambit_build jobs should be working on both Ubuntu and Mac x64, and it looks like they are.

  • The standalones_build job is working on Ubuntu, but on Mac x64 it currently fails with a linking issue:

Undefined symbols for architecture x86_64:
  "Gambit::get_pp_reader()", referenced from:
      bool Gambit::pp_reader_retrieve<double>(double&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) in RelicDensity.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [/Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/DarkBit_standalone_MSSM]
  • The backends_build is working on Ubuntu, but on Mac x64 it fails to build classy, see error below. Not sure if this is a known classy problem or not, but we're probably seeing it in the CI jobs now since on master the entire CI job would stop earlier due to the Rivet issue.

Build error for classy:

Error compiling Cython file:
------------------------------------------------------------
...
    cdef output op
    cdef lensing le
    cdef distortions sd
    cdef file_content fc

    cpdef int computed # Flag to see if classy has already computed with the given pars
          ^
------------------------------------------------------------

classy.pyx:102:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
    cdef lensing le
    cdef distortions sd
    cdef file_content fc

    cpdef int computed # Flag to see if classy has already computed with the given pars
    cpdef int allocated # Flag to see if classy structs are allocated already
          ^
------------------------------------------------------------

classy.pyx:103:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
    cdef distortions sd
    cdef file_content fc

    cpdef int computed # Flag to see if classy has already computed with the given pars
    cpdef int allocated # Flag to see if classy structs are allocated already
    cpdef object _pars # Dictionary of the parameters
          ^
------------------------------------------------------------

classy.pyx:104:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
    cdef file_content fc

    cpdef int computed # Flag to see if classy has already computed with the given pars
    cpdef int allocated # Flag to see if classy structs are allocated already
    cpdef object _pars # Dictionary of the parameters
    cpdef object ncp   # Keeps track of the structures initialized, in view of cleaning.
          ^
------------------------------------------------------------

classy.pyx:105:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
        self.set(**_pars)

    # (JR) added to get information from cosmo object
    # whether class re-computed or not
    #recomputed = True
    cpdef int recomputed
          ^
------------------------------------------------------------

classy.pyx:132:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
    def set_cosmo_update(self,update):
        self.recomputed = update
    # ------------------

    def __cinit__(self, default=False):
        cpdef char* dumc
              ^
------------------------------------------------------------

classy.pyx:142:14: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.

Error compiling Cython file:
------------------------------------------------------------
...
                Desired redshift
        """
        cdef double tau
        cdef int last_index #junk
        cdef double * pvecback
        cpdef double t
              ^
------------------------------------------------------------

classy.pyx:1193:14: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
warning: classy.pyx:373:76: local variable 'errmsg' referenced before assignment
warning: classy.pyx:374:39: local variable 'errmsg' referenced before assignment
Traceback (most recent call last):
Compiling /Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/classy/3.1.0/python/../python/classy.pyx because it changed.
  File "/Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/classy/3.1.0/python/autosetup.py", line 49, in <module>
    setup(
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
[1/1] Cythonizing /Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/classy/3.1.0/python/../python/classy.pyx
    return run_commands(dist)
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/usr/local/lib/python3.10/site-packages/setuptools/dist.py", line 1213, in run_command
    super().run_command(command)
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
    self.run_command(cmd_name)
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/usr/local/lib/python3.10/site-packages/setuptools/dist.py", line 1213, in run_command
    super().run_command(command)
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/usr/local/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/Users/gambitremotecontrol/Library/Python/3.10/lib/python/site-packages/Cython/Distutils/build_ext.py", line 130, in build_extension
    new_ext = cythonize(
  File "/Users/gambitremotecontrol/Library/Python/3.10/lib/python/site-packages/Cython/Build/Dependencies.py", line 1154, in cythonize
    cythonize_one(*args)
  File "/Users/gambitremotecontrol/Library/Python/3.10/lib/python/site-packages/Cython/Build/Dependencies.py", line 1321, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: /Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/classy/3.1.0/python/../python/classy.pyx
usage: autosetup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: autosetup.py --help [cmd1 cmd2 ...]
   or: autosetup.py --help-commands
   or: autosetup.py cmd --help

error: option --user not recognized

@pstoecker
Copy link
Member

So the classy issue seems straight forward. In version 3 of Cython, the support of "cpdef" was dropped and it can be simple solved by using "cdef" instead. Since we already apply patches to classy, I will update the patch files accordingly.

@anderkve
Copy link
Collaborator Author

Great, thanks @pstoecker!

	modified:       Backends/patches/classy/2.6.3/classy_2.6.3.diff
	modified:       Backends/patches/classy/2.9.3/classy_2.9.3.diff
	modified:       Backends/patches/classy/2.9.4/classy_2.9.4.diff
	modified:       Backends/patches/classy/3.1.0/classy_3.1.0.diff
	modified:       Backends/patches/classy/exo_2.7.2/classy_exo_2.7.2.diff
@ChrisJChang
Copy link
Collaborator

* The **standalones_build** job is working on Ubuntu, but on Mac x64 it currently fails with a linking issue:
Undefined symbols for architecture x86_64:
  "Gambit::get_pp_reader()", referenced from:
      bool Gambit::pp_reader_retrieve<double>(double&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) in RelicDensity.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [/Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/DarkBit_standalone_MSSM]

It's worth noting that I hadn't had the build standalones working on the Macs at all, and had previously commented that out in the CI (I uncommented it out in my changes to this branch). I will look into why this is again.

@anderkve
Copy link
Collaborator Author

anderkve commented Dec 1, 2023

It's worth noting that I hadn't had the build standalones working on the Macs at all, and had previously commented that out in the CI (I uncommented it out in my changes to this branch). I will look into why this is again.

Ah, I see. It would certainly be nice to sort it out, but I don't think this is urgent then. It's probably more important that we solve the Rivet issue, since that affects the use of GAMBIT/ColliderBit proper, not only the standalones. I'll take a quick look at the Rivet stuff now to see if I can understand what's going on.

@anderkve
Copy link
Collaborator Author

anderkve commented Dec 1, 2023

@tprocter46 and @agbuckley, perhaps you have some input on this: Rivet fails to build on our Mac x64 CI job because it at some point tries to delete /usr/local/lib/python3.10/site-packages/rivet/plotinfo.py, which gives a permission denied error. See the detailed error output below.

Is this a known issue? Can we tell the Rivet build system to not install anything in /usr/local/lib/python3.10/site-packages/? According to the output below we are passing in a --prefix flag, but we still get this error. (On the Ubuntu CI job it seems to work just fine.)

CC="/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++" CXX="/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++" CXXFLAGS="-isysroot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -std=c++14 -fPIC  -I/usr/local/opt/libomp/include -Wall -Wextra -Wno-misleading-indentation -Wno-deprecated-declarations -Wno-deprecated-copy -I/Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/rivet/3.1.5/include/Rivet -O3 -Wno-deprecated-declarations -Wno-deprecated-copy -Wno-type-limits -Wno-unused-parameter -Wno-ignored-qualifiers" ARCHFLAGS="" /usr/local/bin/python3 \
    /Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/rivet/3.1.5/pyext/setup.py install \
    --root "///" \
    --prefix "/Users/gambitremotecontrol/GAMBIT/Runners/Organisation_level/actions-runner/_work/gambit/gambit/Backends/installed/rivet/3.1.5/local" \
      --verbose \
    --skip-build \
    --force
/usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
error: could not delete '///usr/local/lib/python3.10/site-packages/rivet/plotinfo.py': Permission denied

I changed the pip installed version of setuptools to 58.2.0.

I also added explicit python paths to protect against future updates changing the default python path.
@ChrisJChang
Copy link
Collaborator

The Mac x64 can now build rivet. I fixed this by changing the version of setuptools with pip to be 58.2.0. Not clear to me why this was necessary now, but not previously. I also hard set the python paths (had no effect on rivet build), so that when the Macs automatically update, changed default python locations won't mess things up (happened constantly). I can check whether this will also fix the Arm64 Mac CI. Those currently fail the gambit build because of the issue fixed on PR #468 . Once that is merged into master, and master is merged into this branch, I will check whether this fixes that one.

@anderkve
Copy link
Collaborator Author

@ChrisJChang, I merged #468 to master now and then master into this branch, so let's see if the gambit_build job for Arm64 works now.

@anderkve
Copy link
Collaborator Author

Current status of CI jobs, after merging in #468:

  • gambit_build:

    • Ubuntu: works
    • Mac Arm64: works
    • Mac x64: to be tested
  • backends_build (not including backends tested in test_runs):

    • Ubuntu: works
    • Mac Arm64: works
    • Mac x64: to be tested
  • standalones_build:

    • Ubuntu: works
    • Mac Arm64: fails, due to known issue with printers + standalones + the suspicious_point functionality
    • Mac x64: fails, due to known issue with printers + standalones + the postprocessor reader pp_reader_retrieve
  • test_runs:

    • Ubuntu: fails when building superiso: alphas.o: file not recognized: file format not recognized
      (This does not always happen...)
    • Mac Arm64: fails, due to the contur backend getting status absent/broken. Not sure why, but I suspect it will be fixed by PR Contur 2.1.1->2.4.4, Rivet 3.1.5->3.1.8 upgrade #412.
    • Mac x64: to be tested

I will reactivate the currently disabled CI jobs for Mac x64, so we'll get the full picture.

Assuming that the gambit_build job will work also for Mac x64, my suggestion would be that we then review and merge this PR to master. Then we can propagate this CI job split to the dedicated PRs/branches where we work on solving the individual issues above.

@anderkve
Copy link
Collaborator Author

@ChrisJChang, about the issue with not finding contur: I wouldn't be surprised if this is just the symbol visibility issue: #453 (comment)

Actually, as a first test I'll just add the option -DCMAKE_CXX_FLAGS="-rdynamic" (from #460) to the CI job scripts to see if that solves it. Will do that right away.

@anderkve
Copy link
Collaborator Author

@ChrisJChang, never mind about the Contur issue. I remembered now that it is just this problem, #412 (comment), about a missing python module. So this will be fixed with with PR #412, so we should ignore it here.

@ChrisJChang
Copy link
Collaborator

Ok great. Since the contur issue will be solved in PR #412 , and the standalones issue will be solved when I create a PR to include the printers in the standalones, are there any remaining things to wait on before this is merged?

@anderkve
Copy link
Collaborator Author

Ok great. Since the contur issue will be solved in PR #412 , and the standalones issue will be solved when I create a PR to include the printers in the standalones, are there any remaining things to wait on before this is merged?

No, I don't think so. Though we should have a quick code review, since I've added some new cmake functions/scripts.

@pstoecker, could I ask you to do a quick review? The key changes are in cmake/externals.cmake and cmake/scripts/build_backends.sh, in addition to the three yaml files for the CI jobs. If you don't have time to do the review, just let me know and I'll assign someone else.

Btw, I did one last change to the CI job scripts now, to set the following order of the subjobs: gambit_build, backends_build, test_runs, standalones_build.

Copy link
Member

@pstoecker pstoecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. There are, however, a few CI tasks that are failing, but I still approve this PR as it seems from the coments of this PR that these problems are already known and taken care of in other issues and PRs. but code-wise there is nothing that would require a needed change.

@anderkve
Copy link
Collaborator Author

Thanks for doing the review, @pstoecker! I've replied to your comments and resolved the conversations above. Merging this now.

@anderkve anderkve merged commit 95089e1 into master Jan 12, 2024
@anderkve anderkve deleted the split_CI_jobs branch January 12, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration (tests done via GitHub Actions) Core Core group task WIP work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants