-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement physical and logical codecs in FFI #19079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement physical and logical codecs in FFI #19079
Conversation
| datafusion-catalog = { workspace = true } | ||
| datafusion-common = { workspace = true } | ||
| datafusion-datasource = { workspace = true } | ||
| datafusion-execution = { workspace = true } | ||
| datafusion-expr = { workspace = true } | ||
| datafusion-functions-aggregate-common = { workspace = true } | ||
| datafusion-physical-expr = { workspace = true } | ||
| datafusion-physical-expr-common = { workspace = true } | ||
| datafusion-physical-plan = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the last PR we are adding these crates in explicitly to reduce the PR size when we remove datafusion dependency.
|
@renato2099 @comphead @paleolimbot @ntjohnson1 Here is the next up in the list of PRs for the FFI epic. It's a bit long, but portions of it are near identical since the logical and physical codec traits have a lot of overlap. This one is basically pure addition and the next PRs start to tie it all together. |
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| pub(crate) mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the modifier change because some part of tests codebase reused elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so that I can reuse the EmptyExec in other unit tests, specifically in datafusion/ffi/src/proto/physical_extension_codec.rs
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @timsaucer I had a first glance and its good, can you please educate me on example how encoding/decoding works in this case?
I assume it is still protobuf encoding? but slightly confused of function encoding
Yes, this is used protobuf encoding. There is an example in |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
Mostly just optional suggestions on how to maybe make the tests easier to read 🙂 . This is great!
| _table_ref: &TableReference, | ||
| schema: SchemaRef, | ||
| _ctx: &TaskContext, | ||
| ) -> datafusion_common::Result<Arc<dyn TableProvider>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter really but I think you have Result already in scope (for these and all the tests)
| ) -> datafusion_common::Result<Arc<dyn TableProvider>> { | |
| ) -> Result<Arc<dyn TableProvider>> { |
| fn try_decode_file_format( | ||
| &self, | ||
| _buf: &[u8], | ||
| _ctx: &TaskContext, | ||
| ) -> Result<Arc<dyn FileFormatFactory>> { | ||
| not_impl_err!("FFI does not support decode_file_format") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but in case you're curious, the way I did this in SedonaDB was a sort of simplification of the file format rather than full-on ffiifying it.
| buf.push(Self::MAGIC_NUMBER); | ||
|
|
||
| if !node.as_any().is::<MemTable>() { | ||
| return exec_err!("TestExtensionCodec only expects Abs UDF"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read MemTable and not Abs UDF?
| if buf[0] != Self::MAGIC_NUMBER { | ||
| return exec_err!( | ||
| "TestExtensionCodec input buffer does not start with magic number" | ||
| ); | ||
| } | ||
|
|
||
| if buf.len() != 2 || buf[1] != Self::MAGIC_NUMBER { | ||
| return exec_err!("TestExtensionCodec unable to decode udf"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these checks need to be separated (or alternatively, is there a reason that two identical 127s are better than a single 127 here)?
| _name: &str, | ||
| buf: &[u8], | ||
| ) -> datafusion_common::Result<Arc<ScalarUDF>> { | ||
| if buf[0] != Self::MAGIC_NUMBER { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be more likely to catch a typo in the code that mixed up udf/udaf/udwf if it used a different magic number for each of them.
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR overall is good to go, thanks @timsaucer and @paleolimbot for the review
Which issue does this PR close?
Addresses part of #18671 but does not close it.
Rationale for this change
We have a major restriction in our current FFI crates. They use protobuf encoding and decoding to pass around logical expressions. This encoding and decoding will fail when we encounter a user defined function because we only currently use the default extension codecs. This PR introduces the FFI variants for both logical and physical extension codecs. A follow on PR will enable these in our code base, on order to keep these PRs to a reasonable length for review.
In addition to the benefit to the FFI crate for passing expressions, this will also enable sending plans across FFI boundaries, which may be useful for
datafusion-distributed.What changes are included in this PR?
FFI_LogicalExtensionCodecFFI_PhysicalExtensionCodecThis PR does not use these structures in the current code. That is coming as part of a later PR in an effort to keep the size of the PRs small for effective code review.
Are these changes tested?
Unit tests are added. Coverage report:
Are there any user-facing changes?
No.