Skip to content

Conversation

@hppritcha
Copy link
Member

@hppritcha hppritcha commented May 27, 2025

Two external MPI libraries are now created: libmpi.so and libmpi_abi.so.
Backend code that was originally in libmpi.la has been extracted into
libopen-mpi.la to be linked into both libraries.

Parts of the Open MPI C interface are now being generated by a python
script (abi.py) from modified source files (named with *.in). This
script generates files for both the ompi ABI and the standard ABI from
the same source file, also including new bigcount interfaces.

To compile standard ABI code, there's a new mpicc_abi compiler wrapper.

A few todos left:

  • simplify python framework in particular c_type.py
  • add a bit of plumbing to assist with cleanup of resources used by the attributes callback function wrappers
  • write up docs to explain ABI support both for users and for developers
  • add man pages
  • add fortran entry points for MPI_Abi functions
  • open separate PR to deal with the libnbc release_arrays problem
  • Update python framework to use refactored libnbc release arrays code

todos for follow-up PRs:

  • integrate the self standing mpi.h/abi.h generator into the python framework

This PR supercedes #12033

@dalcinl
Copy link
Contributor

dalcinl commented Jul 11, 2025

Maybe you should somehow vendor the mpi.h header from https://github.com/mpi-forum/mpi-abi-stubs and use it as the baseline to extract values for handles and constants? That's what MPICH is doing.
Alternatively, the mpi.h header the mpi-abi-stubs repo could be downloaded on the fly by some Python script and the values in that header used to update stuff to be committed in the ompi repo.

In short, I think it would be in everyone's convenience to use https://github.com/mpi-forum/mpi-abi-stubs as the "source of truth" for ABI-related stuff, avoiding manual synchronization of handle/constant values.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f7d94fa: WIP: explain issue with pympistandard for callback...

  • check_signed_off: does not contain a valid Signed-off-by line

4d79937: WIP: fix JSONs

  • check_signed_off: does not contain a valid Signed-off-by line

ec9c45a: WIP: fix typo in pympistd arg

  • check_signed_off: does not contain a valid Signed-off-by line

b047b19: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

c9d0c7a: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

a1255ce: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

342b072: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

53aeac5: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

93c0a39: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

afd9eb2: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d397bfc: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

da2630f: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

956dded: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

0b10ce6: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

9830327: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

68c69df: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

702cb23: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

6b2784f: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

22d4cb2: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

b0d0ff2: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

96a33c3: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

d2dd7ed: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

45d8789: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

b081aa2: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

80ebfe7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

93e6375: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

db8a2b5: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

08dfb42: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

aab2023: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

46fae8e: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

c15a05d: WIP: remove abi.py

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f7d94fa: WIP: explain issue with pympistandard for callback...

  • check_signed_off: does not contain a valid Signed-off-by line

4d79937: WIP: fix JSONs

  • check_signed_off: does not contain a valid Signed-off-by line

ec9c45a: WIP: fix typo in pympistd arg

  • check_signed_off: does not contain a valid Signed-off-by line

b047b19: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

c9d0c7a: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

a1255ce: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

342b072: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

53aeac5: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

93c0a39: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

afd9eb2: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d397bfc: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

da2630f: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

956dded: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

0b10ce6: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

9830327: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

68c69df: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

702cb23: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

6b2784f: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

22d4cb2: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

b0d0ff2: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

96a33c3: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

d2dd7ed: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

45d8789: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

b081aa2: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

80ebfe7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

93e6375: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

db8a2b5: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

08dfb42: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

aab2023: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

46fae8e: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

c15a05d: WIP: remove abi.py

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

e86186b: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

24417ec: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

718b1d0: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

3431c8d: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

566fdaa: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

8396bef: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

39c20b7: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d1aece4: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

c7c1809: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

2bbf9eb: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

1db8082: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

870e925: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

a56da85: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

4c2aee1: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

3d85943: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

4e85726: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

8d8f554: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

5f29a48: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

c5d5f30: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

1613149: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

3f229b3: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

584aeb7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

7c73091: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

72d2bff: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

c303345: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

7d79681: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

99e9992: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@jsquyres
Copy link
Member

I wonder if we should make the bot not complain about unsigned commits on draft PRs. That would reduce some of the noise on PR's like this.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Aug 25, 2025
Turns out that in commit 6bd36a7 we had a function that is not part of the MPI standard.
This showed while working on ABI support - which requires us to pay attention to the truth
rather than make stuff up.

This commit removes our made up MPI_Session_set_info method.
Turns out who ever was doing the fortran bindings knew this wasn't a method in the standard
so there's no need to change the fortran bindings.  Same thing applies to the man pages.

Related to open-mpi#13280

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Aug 27, 2025

@hppritcha There is some issue with out-of-source builds, i.e

git clone ... ompi-main
...
mkdir -p ompi-BUILD/main
cd ompi-BUILD/main
../../ompi-main/configure ...
make install
Making install in mpi/c
make[2]: Entering directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
mkdir -p standard_abi
  GENERATE abi.h
  GENERATE standard_abi/mpi.h
Traceback (most recent call last):
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/c_header.py", line 262, in <module>
    with open(OUTPUT, 'tw') as header_out:
         ~~~~^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../../../../../ompi-main/ompi/mpi/c/standard_abi/mpi.h'
make[2]: *** [Makefile:23335: standard_abi/mpi.h] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
make[1]: *** [Makefile:2785: install-recursive] Error 1
make[1]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi'
make: *** [Makefile:1526: install-recursive] Error 1

@hppritcha
Copy link
Member Author

interesting distcheck didn't check this.

@hppritcha
Copy link
Member Author

jenkins ci runs make distcheck

@dalcinl
Copy link
Contributor

dalcinl commented Aug 28, 2025

The ABI mpi.h header is missing some MPI_T_XXX types. Also, the Status f08/c converters are declared, and they should not.

I did the following manual edits to the installed mpi.h header:

diff -up ./mpi.h.orig ./mpi.h
--- ./mpi.h.orig	2025-08-28 18:55:49.842968779 +0300
+++ ./mpi.h	2025-08-28 19:03:07.192957305 +0300
@@ -490,6 +490,13 @@ enum {
 /* C preprocessor constants and Fortran parameters */
 /* $CATEGORY:C_PREPROCESSOR_CONSTANTS_FORTRAN_PARAMETERS$ */
 
+typedef struct MPI_T_enum_t* MPI_T_enum;
+typedef struct MPI_T_cvar_handle_t* MPI_T_cvar_handle;
+typedef struct MPI_T_pvar_handle_t* MPI_T_pvar_handle;
+typedef struct MPI_T_pvar_session_t* MPI_T_pvar_session;
+typedef struct MPI_T_event_registration_t* MPI_T_event_registration;
+typedef struct MPI_T_event_instance_t* MPI_T_event_instance;
+
 /* Handles used in the MPI tool information interface */
 #define MPI_T_ENUM_NULL                       ((MPI_T_enum) 0)
 #define MPI_T_CVAR_HANDLE_NULL                ((MPI_T_cvar_handle) 0)
@@ -558,20 +565,20 @@ enum {
 };
 
 /* Source event ordering guarantees in the MPI tool information interface */
-enum {
+typedef enum MPI_T_source_order {
     MPI_T_SOURCE_ORDERED                      = 1,
     MPI_T_SOURCE_UNORDERED                    = 2,
-};
+} MPI_T_source_order;
 
 /*
  * Callback safety requirement levels used in the MPI tool information interface
  */
-enum {
+typedef enum MPI_T_cb_safety {
     MPI_T_CB_REQUIRE_NONE                     = 0x00,
     MPI_T_CB_REQUIRE_MPI_RESTRICTED           = 0x03,
     MPI_T_CB_REQUIRE_THREAD_SAFE              = 0x0F,
     MPI_T_CB_REQUIRE_ASYNC_SIGNAL_SAFE        = 0x3F,
-};
+} MPI_T_cb_safety;
 
 
 /* Callback functions */
@@ -1107,13 +1114,13 @@ int MPI_Ssend_init(const void* buf, int
 int MPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int MPI_Start(MPI_Request* request);
 int MPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int MPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int MPI_Status_get_error(const MPI_Status* status, int* err);
 int MPI_Status_get_source(const MPI_Status* status, int* source);
 int MPI_Status_get_tag(const MPI_Status* status, int* tag);
 int MPI_Status_set_cancelled(MPI_Status* status, int flag);
@@ -1799,13 +1806,13 @@ int PMPI_Ssend_init(const void* buf, int
 int PMPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int PMPI_Start(MPI_Request* request);
 int PMPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int PMPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int PMPI_Status_get_error(const MPI_Status* status, int* err);
 int PMPI_Status_get_source(const MPI_Status* status, int* source);
 int PMPI_Status_get_tag(const MPI_Status* status, int* tag);
 int PMPI_Status_set_cancelled(MPI_Status* status, int flag);

I'm able to compile (using mpicc_abi) the pure-C demo/helloworld.c file from mpi4py sources using gcc-15.
Note however that the executable fails to link:

$ mpicc_abi helloworld.c
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_null'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_isendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nthkey'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_copy_or_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_delete'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_sendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_bool'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_free'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_cb'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_process'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_dup'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nkeys'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_allocate'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_valuelen'
collect2: error: ld returned 1 exit status

@dalcinl
Copy link
Contributor

dalcinl commented Sep 2, 2025

@hppritcha Actually, take a look at mpi-forum/mpi-abi-stubs#63

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

The installed mpi.h header is still broken, I had to fix it as per the patch in my previous comment.
Afterwards, my basic helloworld program compiles and links, but it fails to run:

$ mpicc_abi helloworld.c
$ mpiexec -n 1 ./a.out 
[optiplex:00000] *** An error occurred in MPI_Init_thread
[optiplex:00000] *** reported by process [3633774593,0]
[optiplex:00000] *** on a NULL communicator
[optiplex:00000] *** MPI_ERR_ARG: invalid argument of some other kind
[optiplex:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[optiplex:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node optiplex calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha The following definitions in the generated mpi.h file are wrong, they should be like ((MPI_Comm_copy_attr_function*) 0), that is, add the missing * to get a function POINTER type.
https://github.com/mpi-forum/mpi-abi-stubs/blob/main/mpi.h#L470

/* Predefined functions */
#define MPI_COMM_NULL_COPY_FN                 ((MPI_Comm_copy_attr_function) 0)
#define MPI_COMM_DUP_FN                       ((MPI_Comm_copy_attr_function) 1)
#define MPI_COMM_NULL_DELETE_FN               ((MPI_Comm_delete_attr_function) 0)
#define MPI_WIN_NULL_COPY_FN                  ((MPI_Win_copy_attr_function) 0)
#define MPI_WIN_DUP_FN                        ((MPI_Win_copy_attr_function) 1)
#define MPI_WIN_NULL_DELETE_FN                ((MPI_Win_delete_attr_function) 0)
#define MPI_TYPE_NULL_COPY_FN                 ((MPI_Type_copy_attr_function) 0)
#define MPI_TYPE_DUP_FN                       ((MPI_Type_copy_attr_function) 1)
#define MPI_TYPE_NULL_DELETE_FN               ((MPI_Type_delete_attr_function) 0)
#define MPI_CONVERSION_FN_NULL                ((MPI_Datarep_conversion_function) 0)
#define MPI_CONVERSION_FN_NULL_C              ((MPI_Datarep_conversion_function_c) 0)

/* Deprecated predefined functions */
#define MPI_NULL_COPY_FN                      ((MPI_Copy_function) 0)
#define MPI_DUP_FN                            ((MPI_Copy_function) 1)
#define MPI_NULL_DELETE_FN                    ((MPI_Delete_function) 0)

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

PS: You may have a similar problems in some MPI_T_XXX routines having arguments of type MPI_T_xxx_function, I believe you should add the * .


__opal_attribute_always_inline__ static inline int ompi_convert_abi_ts_level_intern_ts_level(int ts_level)
{
if (MPI_THREAD_SINGLE_ABI_INTERNAL == ts_level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a switch statement be better? It may ultimately be a matter of taste (optimizing compilers may generate similar binary code). Just a mild observation.

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha What's your plan for stuff introduced in MPI 4.1 and MPI 5.0? The generated mpi.h header says MPI_VERSION=5, however the MPI library libmpi_abi.so.0 seems to miss a bunch of stuff from MPI 4.1 and 5.0.

This is a list of what I managed to detect as missing from mpi4py's configuration machinery for missing MPI stuff.
Note there are a few that are quite old, like MPI_Aint_add/diff, MPI_Wtick, and MPI_Pcontrol.

MPI_Aint_add
MPI_Aint_diff
MPI_Type_get_value_index
MPI_Type_get_envelope
MPI_Type_get_contents
MPI_Type_get_envelope_c
MPI_Type_get_contents_c
MPI_Buffer_flush
MPI_Buffer_iflush
MPI_Comm_attach_buffer
MPI_Comm_detach_buffer
MPI_Comm_flush_buffer
MPI_Comm_iflush_buffer
MPI_Session_attach_buffer
MPI_Session_detach_buffer
MPI_Session_flush_buffer
MPI_Session_iflush_buffer
MPI_Comm_errhandler_fn
MPI_Comm_attach_buffer_c
MPI_Comm_detach_buffer_c
MPI_Session_attach_buffer_c
MPI_Session_detach_buffer_c
MPI_Remove_error_class
MPI_Remove_error_code
MPI_Remove_error_string
MPI_Abi_get_fortran_info
MPI_Get_hw_resource_info
MPI_Wtick
MPI_Pcontrol
MPI_Comm_toint
MPI_Errhandler_toint
MPI_File_toint
MPI_Group_toint
MPI_Info_toint
MPI_Message_toint
MPI_Op_toint
MPI_Request_toint
MPI_Session_toint
MPI_Type_toint
MPI_Win_toint
MPI_Comm_fromint
MPI_Errhandler_fromint
MPI_File_fromint
MPI_Group_fromint
MPI_Info_fromint
MPI_Message_fromint
MPI_Op_fromint
MPI_Request_fromint
MPI_Session_fromint
MPI_Type_fromint
MPI_Win_fromint

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

Current status regarding mpi4py:

  • I'm able to build mpi4py using export MPICC=mpicc_abi.
  • However, due to missing symbols, I'm not able to import the Python module.
  • I can switch via LD_LIBRARY_PATH to MPICH's libmpi_abi.so, but there are a few issues related to datatype handles and a few failing tests involving MPI callbacks (errhandlers and attributes). I need to investigate further.

@hppritcha
Copy link
Member Author

Thanks for checking this stuff out in its early state @dalcinl. some of the above functions have been implemented but are sitting in various states in PRs. some of these should be defined - like MPI_Wtick and the aint related functions so i'll see what's going on. As for the fromint/toint that seems like an excellent project for a AI LLM. I'll see what can be done there.

We still need to add some plumbing in the ompi internals for callbacks. that will come in as part of this PR.

The NERSC folks would really like ABI working for Doudna so to the extent there's a "plan" it would be nice to get this working sooner than later.

@rhc54
Copy link
Contributor

rhc54 commented Nov 27, 2025

The libtool version rules SHOULD NOT be abused to mitigate that problem.

Perhaps we should leave it as "we agree to disagree". We feel we are not abusing the libtool version rules - we would say that you are failing to follow the libtool objective by not properly respecting an ABI's nature. 🤷‍♂️ Probably makes no difference as long as users know what was done and what they are getting into.

Comment on lines +16 to +17
#define MPI_ABI_VERSION 1
#define MPI_ABI_SUBVERSION 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set these to -1 if !OMPI_ABI_SRC? Or is this file only used in ABI scenarios?

If this file is only used in ABI scenarios, do we define MPI_ABI_[SUB]VERSION to -1 in some other file that is used in non-ABI scenarios? (I didn't find one, but I could have missed it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these should not be defined in the non-ABI header.

This is what the MPI standard says, though the wording is not great IMO.

The ABI version macros MPI_ABI_VERSION and MPI_ABI_SUBVERSION are present in
the MPI header and modules so that applications can check for consistency between the
compilation environment and the properties of the implementation at runtime.

# define MPI_ABI_VERSION 1
# define MPI_ABI_SUBVERSION 0

The idea is that users can do conditional compilation

#if defined(MPI_ABI_VERSION)
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But MPI-5.0 p844:23-24 says:

MPI_ABI_GET_VERSION produces the standard ABI version, if supported. Other-
wise, the values of the major and minor version are set to−1.

Shouldn't the same be true for the compile-time constants MPI_[SUB]VERSION?

I.e., your example conditional compilation should be:

#if MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Member

@jsquyres jsquyres Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a more complete conditional compilation could be:

#if defined(MPI_ABI_VERSION) && MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Or:

// If MPI_VERSION is greater than 5, then MPI_ABI_VERSION must exist
#if MPI_VERSION >= 5 && MPI_ABI_VERSION > 0
// code that assumes being compiled under the MPI ABI
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the same be true for the compile-time constants MPI_[SUB]VERSION?

Look, I think this was not the original intention. Nonetheless, you have a point, I'm not really opposed to your proposal. The only caveat is that the current MPI standard may not be absolutely clear about it, and most likely we need an errata entry to clarify this matter (whatever the final decision is).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffhammond @hzhou @jsquyres Can we reach an agreement in advance about whether the MPI_ABI_VERSION/SUBVERSION macros should or should not go to the non-ABI mpi.h header?

IIRC, I originally advocated for these macros to NOT be defined in the non-ABI header, but there is API symmetry arguments to have them always defined (with {1,0} or {-1/-1} values for ABI and non-ABI respectively), so maybe we should consider adding them for the non-ABI header. The fix in MPICH and this PR is quite trivial and mostly inconsequential.

Once we reach an agreement between implementations, we can think of publishing an errata to the MPI standard document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abi.h.in is only used by the code in ompi/mpi/c and ompi/mpi/tool and is also used to generate ompi/mpi/c/standard_abi/mpi.h.

But i could see the use of defining them in the OMPI mpi.h.in as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i don't think we should add this to the OMPI mpi.h till we discuss this in the forum.

See mpi-forum/mpi-issues#1098

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay discussion is happening. rather than fill this PR with mpi forum type comments please chime in on the forum issue cited above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i answered on mpi-forum/mpi-issues#1098 so i won't repost them here

@dalcinl
Copy link
Contributor

dalcinl commented Nov 27, 2025

Perhaps we should leave it as "we agree to disagree".

I would, except that you keep commenting and throwing fear and uncertainty on what we are trying to accomplish.

We feel we are not abusing the libtool version rules

Your claim is hard to disproof. You talk in circles, and never specifically state what exact rule are you following to update libtool's C/R/A tuple. At this point I have to suspect you do that to hide the fact that you are indeed breaking the rules.

we would say that you are failing to follow the libtool objective by not properly respecting an ABI's nature.

You keep speaking in abstract terms, not specifying clearly how and what libtool objective I'm not following, nor what ABI's nature mean.

I already stated many times: backward-compatible additions to the MPI API/ABI must bump both the current and age numbers of the c/r/a libtool tuple, and therefore shared library SONAME must not change. If this statement is incorrect, please indicate clearly and precisely how it is incorrect with proper references to published standards or documentation.

🤷‍♂️ Probably makes no difference as long as users know what was done and what they are getting into.

I cannot confirm that, because you have not yet stated clearly what's the alternative.

Again: what are you proposing exactly? Are you arguing that every time the MPI standard add new features (handle types/values, functions) without making backward incompatible changes (like removing functions) then the MPI shared library should bump the SONAME from libmpi_abi.so.{N} to libmpi_abi.so.{N+1} ?

jtronge and others added 15 commits December 1, 2025 08:51
Two external MPI libraries are now created: libmpi.la and libmpi_abi.la.
Backend code that was originally in libmpi.la has been extracted into
libopen-mpi.la to be linked into both libraries.

Parts of the Open MPI C interface are now being generated by a python
script (abi.py) from modified source files (named with *.in). This
script generates files for both the ompi ABI and the standard ABI from
the same source file, also including new bigcount interfaces.

To compile standard ABI code, there's a new mpicc_abi compiler wrapper.
The standard ABI does not yet include all functions or symbols, so more
complicated source files will not compile. ROMIO must be disabled for
the code to link, since it's relying on the external MPI interface.

Signed-off-by: Jake Tronge <jtronge@lanl.gov>
Implement lots of missing functionality in the original
api.py script.  The functionality is now part of the
bindings generation framework used for Big Count.

Number of todos still to do, in particular provide
support for wrapping user supplied callback functions.
Also, the sendrecv_replace, etc. code needs to be
refactored to work with the bindings framework.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
to heed the --enable-abi-standard config option

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
for comm/type/win attributes.

Not clear if this is the way to handle attr copy/del call
backs compiled againt the mpi.h ABI header file.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This approach works for attributes but other approaches will
be needed for error handlers and datatype conversion related
functions.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This mod switches from using "synthetic" defined values and handles for
the ones specified in the MPI 5.1 standard.

The python infrastructure included in this PR for generating both a
"canonical" abi standard compatible MPI 5.1 mpi.h using two json files:

1) mpi-standard-abi.json
2) mpi-standard-apis.json

2 is generated as part of building the MPI standard.  We import that into our project
for use in generating both the mpi.h as well as interface definitions in the man pages.

1 is generated using a separate script that processes the tables in Appendix A of
standard.  Ideally this script will be merged into the MPI standard code base at some
point.  This script is currently at https://github.com/Joe-Downs/mpi-standard/tree/pr/handle-constant-tool/const-tool .
It is used to generate the portion of mpi.h where defined values and handles are specified.

The converter functions that had been generated as part of the build out of the abi variants
of the 'c' MPI interfaces are no define in persistent file.  The methods
defined in this file will be optimized to make use of the Huffman code characteristics of
the predefined values in a subsequent PR.

This commit also enables generation of the abi interfaces and header files by default.

Signed-off-by: Joseph Downs <joe.downs@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
in the library as they are not part of the ABI.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha hppritcha force-pushed the abi-generate-ver2 branch 3 times, most recently from c56056c to 428c81a Compare December 8, 2025 22:08
fix req converter abi to ompi

fix problem with request_get_status_some

bindings: fix up REQUEST_CONST for abi

tools: first steps to add support for ABI

Turns out that the ABI standard impacts MPI_T related constants, etc.
and we don't want to switch to this for our OMPI standard, so we need
to add support for MPI_T_ ABI compliant functions.

Additions to the binding infrastructure to support generation of ABI
bindings will be introduced in a subsequent commit.

fix problem with undefined symbols in libmpi_abi

hack on abi json file

The MPI_T source order and MPI_T cb named enum types
and their enum constants need special treatment.

Ultimately the tool used to generate this json file -

https://github.com/Joe-Downs/mpi-standard/blob/pr/handle-constant-tool/const-tool/test.py

needs to find a home either in the MPI standard or in the pympistandard repo or elsewhere.

abi.h template - add MPI_T related structs

binding framework: fixes for MPI T stuff

Add various MPI_T related structs to the manglizer.
Also more workarounds for pympistandard not knowing about
MPI 5.1 chapter sections 19.3.4 and 19.3.5 not being part
of the ABI standard.

binding framework: add a TS_LEVEL type

more hacks on the mpi-standard-5.0-abi.json

json file

fix for enable-mca-dso

makefile changes to add symbols to libmpi_abi

abi: fix problems with error handler converters

Fix problem with the abi -> ompi and ompi -> abi
converters for errhandler handles.

Also fix the bindings code where the original problem
was.  We aren't using the bindings framework now to
generate the converters but we may possibly return
to that approach at some point so fix that code in
this commit.

fix problems with makefiles and some symbols

being doubly defined in the profile/non-profile abi mpi libs.

abi: move mpi_type_get_envelope etc. into templates

MPI_Type_get_envelop and MPI_Type_get_contents are a big
mess as the MPI Forum decided not just to add big variants
but also add additional arguments for the big count variants.

So this necessitated enhancements to the binding infrastructure
to support optional suppressing of bc and non-bc variants of
prototype files.

rebase fixup

add abi variantes of mpi_aint_diff and add

add abi_set/get_fortran_info

implementation of abi_set_fortran_info is not
complete.

abi_fortran_stuff: fix up the imp of these

abi_converters: add fortran datatypes to

the converters.

fix for mac-os CI

pr feedback on add/diff for aints

configury: discover fortran logical false

rather than assuming its '0'/

configury fix

for case of configure with fortran

squashme: temporary commit

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
abi_fortran: add support for LOGICAL16

logical16 patch

missed something

add comm_from/toint

simpler to hard code abi/ompi variants.
For ABI version add an offset to the values normally
used for original "f2c/c2f" ops to push the values normally
returned by "c2f" functions to be greater than any ABI
defined constants.

complete toint/fromint interfaces

basic implementation checks if constant is <= 16384
(the largest value reserved for mpi abi constants)
and returns the constant back or looks up the value
in the relevant array of pointers and offsets by 16385.

fix error return values for ABI routines

somehow was using the wrong converter to translate
ompi intern error values with abi ones.
Many are actually the same but that's just by chance.

minor fixup for toint/fromint

rebase fix

handle TAG more correctly

even though currently MPI_ANY_TAG is the same in ABI and OMPI

squash compiler warning

add hooks for TAG_OUT type

add better support for MPI_ROOT and source

since MPI_ROOT for ABI is not equal to ompi.
MPI_ANY_SOURCE is but still generate conversion code
rather than just beining lucky.

squash a compiler warning

this fix kind of messes up the indenter so the resulting line
in the generated code isn't nicely indented but that can be fixed later.

some fixes to comm attributes wrappers

also see about fixing an issue with some of the ompi
mca components when using --enable-mca-dso.

fix bug in comm attr copy code

some fixes for attributes and more

to start using the ompi-tests/ibm had to complete adding
MPI_T_ to the abi.

Also some other fixes.

Attributes still need more work.

checkpoint

many changes.  go back to generating abi_converters.h
fix up tools for abi case, etc.

fix for datatype converters

distcheck fix

EVENT_INSTANCE: arg type cast fix

and add in a file that wasn't in the makefile!

MPI_ROOT: capture proc null type too

needed to get a lot of intercommunicator collectives to pass.

requests: fixes to some multirequest test functions

weights and source out support/fixes

fixes problems with dist graph constructors
and cart_shift output.

some fixes for message related functions

also add better support for the 'some' test/wait functions.

fix testall and startall

fix mpi4py break

fix a few problems with datatype bindings

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
add replacements to code bodies for various string lengths

We now scan the generated internal 'c' body of each
function for replaceable variants of

MPI_MAX_DATAREP_STRING
MPI_MAX_ERROR_STRING
MPI_MAX_INFO_KEY
MPI_MAX_INFO_VAL
MPI_MAX_LIBRARY_VERSION_STRING
MPI_MAX_OBJECT_NAME
MPI_MAX_PORT_NAME
MPI_MAX_PROCESSOR_NAME
MPI_MAX_STRINGTAG_LEN
MPI_MAX_PSET_NAME_LEN

This way the code body sees the same constant values as the
app sees using either the OMPI or ABI variants of the mpi.h

add support for distrib array and order

for subarray type datatype constructors

add support for mode bits - in only

add support for amode out

add support for whence

used by some MPI I/O functions

add support for some win attributes

handle special case of MPI_DISPLACEMENT_CURRENT

add support for combiner, typeclass, win lock assert

types

c_header: comment out deprecated functions

The ABI doesn't support any of the functions deprecated in MPI 3.1
or earlier.  See section 20.2.1 of the MPI 5.0 standard.

fix problem with special attrs for windows

fix for win_shared_query

fixes to rget/rget_accumulate

fix problem with code gen for win create keyval

fix rank problem in rput/raccumulate

add MPI_GROUP_EMPTY to predefined group handles

handle user error classes and codes

temporary WAR for non-blocking alltoallw

variants.

We'll need to append the temporary arrays holding
translated datatype handles to the nbc request for
cleanup after the non-blocking op is completed or
the persistent request has been freed.

add support for comm topos

support INOUT attribute for all handles

thanks to dalcinl for help here

fix problem with attribute callback handling

need to convert the ompi internal keyval to abi one
before invoking user supplied callbacks for win, type, comm

catch use of special buffer consts

like MPI_BUFFER_AUTOMATIC

remove some debug statements

swat nit

patch get_address

add new type to handle out void stars

from buffer detach operations.

fixes for datatypes for neighbor collectives

add inouts for op, errhandler, info

cleanup datatype tmps for ialltoallw and friends

leverage nbc infrastructure

a logical16 fix

Thanks to dalcinl for finding.

abi_get_version/get_info add to ompi abi lib

and correction major/minor versoin returned from the ABI
build.

toint fixes

thanks to dalcinl !

fix issue with ASYNC data arrays cleanup

various fixes from dalcinl

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
ompi-codegen.patch from dalcinl
ompi-abiinfo.patch from dalcinl
ompi-status.patch from dalcinl
fixes to setting/getting status fields and some more
move some deprecated funcs out of libmpi-abi
add support for typeclass
ops: convert data from internal to abi
for user ops
split rdma modes out from modes for files

the ompi internal rdma and file mode bits overlap,
so we need to treat them that way in the bindings
generation code too.

add SOURCE_ARRAY type

to help out MPI_Group_translate_ranks

fixes to abi_get_fortran_info

for logicals

apply patch omp-op-inout.patch

from dalcinl

apply patch ompi-abi-fortran.patch

from dalcinl

fix for FD_INOUT

fix for isend dst arg

needs to go through translation

apply patch ompi-query-thread.patch

from dalcinl

various pt2pt fixes to handle proc_null etc.

eventually we'll use separate SOURCE and DEST (or something like that)
types.

rma: updates to args to handle proc_null etc

adjustments for improbe and iprobe

fix for status array for MPI_Request_testsome

switch to using a malloc wrapper

per dalcinl suggestion

plug memory leak handling REQUEST_INOUT type

patch status out to handle copy in

there are many functions where a status is conditionally
returned.  Trying to treat each of these would be too
complex so instead copy contents of the status supplied as
input into any temporary status variables then copy back
out.

handle dargs for darray_create correctly

fixes for spawn multiple - array of info args

and array of errorcodes

disable async NBC-based cleanup for now
turn back on async array cleanup stuff
and add a check that the returned request isn't
complete.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
mpi_info_toint/fromint prior to MPI_Init or MPI_Session_init
first pass at errhandler support
simplify python support for user-defined err handlers
VERSION - add support for versioning libmpi_abi
add a blurb about how current and age and revision
should work for this library.
add man pages for new MPI_Abi and fromint/toint functions
attributes - clean up extra helper data
using the bindings_extra_state arg to ompi_attr_create_keyval
add a readme about the MPI ABI support
fortran: add the MPI_Abi entry points
python framework: improve debugging
also import util module

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
irrespective of types of datatyps feed into non-blocking/persistent
alltoallw.

We'll replace this with something better in a separate PR.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants