Skip to content

Conversation

@BrianXu0623
Copy link
Contributor

@BrianXu0623 BrianXu0623 commented Dec 30, 2025

Summary

This PR addresses the regression discussed in issue #385 regarding Data field access. It restores backward compatibility by ensuring standard field access returns bytes, while introducing a new API for high-performance zero-copy access.

Reason

As pointed out in comment, returning memoryview by default for Data fields is a breaking change.

However, the ability to access data without copying (zero-copy) is still valuable for performance-critical applications.

Key Changes

  1. Reverted Default Behavior: Accessing a data field (e.g., msg.dataField) now returns a bytes object (a copy). This ensures full backward compatibility with existing code.
  2. Added get_data_as_view(field_name): Introduced a new method to explicitly request a memoryview.
    • This view is writable (if the message is a builder), allowing in-place modification of the underlying buffer.
    • This provides a zero-copy path for users who specifically opt-in for performance.

Usage Example

# 1. Standard usage (Backward Compatible)
msg.dataField = b"hello"
assert isinstance(msg.dataField, bytes)  # Now returns bytes again

# 2. Zero-copy usage (New API)
view = msg.get_data_as_view("dataField")
view[0] = ord('H')  # In-place modification
assert msg.dataField == b"Hello"

@theunkn0wn1
Copy link

retrieved the build artifact and tested with my downstream, appear to fix the bug.
Thanks!

can also verify the newly added api appears to function as intended.

Comment on lines 1232 to 1233
WARNING: The memoryview is tied to the underlying Cap'n Proto message.
If the message is destroyed, accessing this view will cause a crash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds unsafe.
Can the memoryview hold a strong reference to the underlying capnp message?
this should tie the message object to the lifespan of the memoryview.

Copy link
Contributor Author

@BrianXu0623 BrianXu0623 Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Cap'n Proto relies on the C++ layer allocator (Arena) for lifecycle management. We should avoid letting Python-level strong references implicitly interfere with this lifecycle.

  • Since Cap'n Proto uses arena allocation, holding a strong reference on a tiny data field would implicitly keep the entire underlying Message (potentially massive) alive in memory.

  • If the message object is destroyed, the expectation is that the underlying memory is reclaimed. Users should access views using get_data_as_view() within the scope of the message's lifespan, rather than relying on a detached view to indefinitely extend the message's life.

  • Reference counting introduces unnecessary CPU overhead. If a user specifically needs the view to outlive the message (safety over performance), they should rely on the original methods (i.e., making a copy) rather than this zero-copy optimization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by 'crash' do we mean segfault or would a python exception be raised?
if the latter then this is fine.
if the former, that may lead to a vulnerability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause undefined behaviour.
Now I agree that adding a reference is necessary to ensure user's memory safety and have updated the code.
Thanks for the great advice!

@BrianXu0623
Copy link
Contributor Author

retrieved the build artifact and tested with my downstream, appear to fix the bug. Thanks!

can also verify the newly added api appears to function as intended.

Thanks for testing! Yeah I have already added test cases to cover all the changes.

Copy link

@theunkn0wn1 theunkn0wn1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern with raising raw Exception is not blocking. would be good if its fixed.
Otherwise LGTM.

# Return read-only memoryview
cdef Py_buffer buf
if PyBuffer_FillInfo(&buf, self, <void*>temp_data.begin(), temp_data.size(), 1, PyBUF_CONTIG_RO) < 0:
raise Exception("Failed to create buffer info")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we raise a more specific exception?
raw Exceptions are bad practice to catch in downstream code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, yes I have updated.

Copy link

@theunkn0wn1 theunkn0wn1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants