-
-
Notifications
You must be signed in to change notification settings - Fork 25
Fix rclpy events executor #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yes, you need to add: at the end of https://github.com/Flova/ros-jazzy/blob/fix/events_executor/pkg_additional_info.yaml#L159 . Can you also provide a reproducer or some more hints on the problem that this fixes? |
|
Another problem to understand is if this changes the ABI of the |
As far as I know the changed ABI is exclusivly used inside the rclpy package. All other packages use it indirectly via a Python API that is expose using pybind11. So it should still be ABI compatible to all other packages. |
|
We could also do a "hack" and set the default timepoint using a magic value that is smaller than chrono max(). That would keep the existing API. But I would argue against this and would just use the upstream PR as a patch. |
Okay. I will do that 👍 |
The default executor (which is responsible for scheduling callbacks etc.) is famously quite slow. Therefore, recently a new implementation based on an events queue was added. This implementation is significantly more efficient and similar to the events executor in rclcpp. When trying out Pixi and Robostack with our software, which uses the events executor, I noticed some deadlock issues (or so I thought) with it. I first thought it might be an incompatible dependency, but after a lot of investigation using gdb and valgrind it turned out to be a "memory" issue that was present in the implementation all along, leading to undefined behavior. In the past this was not noticed and the undefined behavior didn't cause any problems, but due to the sligthly different compilation setup used by robostack a more violent failure occurs more frequently. The issue is as follows: The events queue dispatches new events from its queue one after the other. If there are no new events it waits using a conditional variable. This wait has an optional timeout. If no timeout is given, the latest time in the epoch is used, which too far in the future for the underlying syscall, leading to the crash. Solutions:
|
|
The following example does not print the string msgs from To publish msgs to the topic use: from rclpy.experimental.events_executor import EventsExecutor
from rclpy.node import Node
import rclpy
from std_msgs.msg import String
rclpy.init()
a = Node("test")
def t(msg):
print(msg)
a.create_subscription(String, "chatter", t, 1)
ex = EventsExecutor()
ex.add_node(a)
ex.spin() |
1f0646f to
5523559
Compare
|
The page is a bit buggy for me right now, but the failure of the macos action seems to be unrelated right?
|
Yes, changing to |
Done |
Thanks, this is clear. So this is basically a mitigation for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58931. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113327 and other related compiler/libstdc++ issues? |
Exactly. |
|
Everything seems to pass 🎉 |
|
Btw. this will be relevant for kilted too. Should I open another PR there? |
Sure! I still want to double check the ABI thing, but if you need it in kilted feel free to open a PR. |
|
A quick github search shows that the events queue header is only referenced by rclpy itself: https://grep.app/search?q=events_queue.hpp The other hits are all for the rclcpp implementation, which is a different thing. Imo. it also doesn't really make sense to use this API at a different place. Or did I missunderstand your concerns and they are regarding the internals of the rclpy build? |
That sounds reasonable, I just want to double check in detail as ABI issues are quite nasty, |
|
The modified header is private and not installed, so we should be safe to merge. I will merge soon unless @Tobias-Fischer or @wolfv object. |
|
Thanks for the quick merge! |
|
Quick question: When I install rclpy as a standalone package I get the new version. But the ros-jazzy-desktop package uses the ros mutex 0.11, so it is not installed. Is this intended behavior? |
As described in ros2/rclpy#1563 rclpy currently contains a memory issue. This leads to undefined behavior, which is especially noticable in the robostack builds. I would suggest that we add this as a patch now, as it is currently broken and upstream backporting etc. can take a while.
Do I need to bump the build number or something like that?