-
Notifications
You must be signed in to change notification settings - Fork 130
Revert Data fields to bytes and add get_data_as_view for zero-copy access #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Revert Data fields to bytes and add get_data_as_view for zero-copy access #390
Conversation
|
retrieved the build artifact and tested with my downstream, appear to fix the bug. can also verify the newly added api appears to function as intended. |
capnp/lib/capnp.pyx
Outdated
| WARNING: The memoryview is tied to the underlying Cap'n Proto message. | ||
| If the message is destroyed, accessing this view will cause a crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds unsafe.
Can the memoryview hold a strong reference to the underlying capnp message?
this should tie the message object to the lifespan of the memoryview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that Cap'n Proto relies on the C++ layer allocator (Arena) for lifecycle management. We should avoid letting Python-level strong references implicitly interfere with this lifecycle.
-
Since Cap'n Proto uses arena allocation, holding a strong reference on a tiny data field would implicitly keep the entire underlying Message (potentially massive) alive in memory.
-
If the message object is destroyed, the expectation is that the underlying memory is reclaimed. Users should access views using
get_data_as_view()within the scope of the message's lifespan, rather than relying on a detached view to indefinitely extend the message's life. -
Reference counting introduces unnecessary CPU overhead. If a user specifically needs the view to outlive the message (safety over performance), they should rely on the original methods (i.e., making a copy) rather than this zero-copy optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by 'crash' do we mean segfault or would a python exception be raised?
if the latter then this is fine.
if the former, that may lead to a vulnerability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will cause undefined behaviour.
Now I agree that adding a reference is necessary to ensure user's memory safety and have updated the code.
Thanks for the great advice!
Thanks for testing! Yeah I have already added test cases to cover all the changes. |
theunkn0wn1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concern with raising raw Exception is not blocking. would be good if its fixed.
Otherwise LGTM.
capnp/lib/capnp.pyx
Outdated
| # Return read-only memoryview | ||
| cdef Py_buffer buf | ||
| if PyBuffer_FillInfo(&buf, self, <void*>temp_data.begin(), temp_data.size(), 1, PyBUF_CONTIG_RO) < 0: | ||
| raise Exception("Failed to create buffer info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we raise a more specific exception?
raw Exceptions are bad practice to catch in downstream code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out, yes I have updated.
theunkn0wn1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Summary
This PR addresses the regression discussed in issue #385 regarding
Datafield access. It restores backward compatibility by ensuring standard field access returnsbytes, while introducing a new API for high-performance zero-copy access.Reason
As pointed out in comment, returning
memoryviewby default forDatafields is a breaking change.However, the ability to access data without copying (zero-copy) is still valuable for performance-critical applications.
Key Changes
msg.dataField) now returns abytesobject (a copy). This ensures full backward compatibility with existing code.get_data_as_view(field_name): Introduced a new method to explicitly request amemoryview.Usage Example