-
Notifications
You must be signed in to change notification settings - Fork 198
[AURON #1739] Support LIMIT with OFFSET #1740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
4d7845c to
4e4eab0
Compare
|
cc @richox |
c1b22ac to
75db692
Compare
--------- Co-authored-by: cxzl25 <3898450+cxzl25@users.noreply.github.com>
75db692 to
f8a0dbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for SQL LIMIT with OFFSET functionality to the Auron query engine, implementing the feature for Spark 3.4 and later versions.
Key Changes
- Extended limit operations to support offset parameter, allowing queries like
SELECT * FROM table LIMIT 5 OFFSET 2 - Changed parameter types from
LongtoIntfor consistency with Spark's internal representations - Updated protobuf schema to include offset field with reduced integer size (
uint64→uint32)
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| NativeTakeOrderedBase.scala | Added offset parameter; updated executeCollect to handle offset by dropping rows after sorted merge |
| NativeLocalLimitBase.scala | Changed limit parameter type from Long to Int |
| NativeGlobalLimitBase.scala | Added offset parameter and updated native execution to pass offset to protobuf |
| NativeCollectLimitBase.scala | Added offset parameter; updated executeCollect and native execution to handle offset |
| Shims.scala | Added getLimitAndOffset methods for Spark 3.4+ exec nodes; updated factory method signatures |
| AuronConverters.scala | Updated converters to extract and pass limit/offset pairs from Spark exec nodes |
| AuronExecSuite.scala | Added comprehensive tests for limit with offset across different exec types |
| NativeTakeOrderedExec.scala | Updated case class to include offset parameter |
| NativePartialTakeOrderedExec.scala | Changed limit parameter type from Long to Int |
| NativeLocalLimitExec.scala | Changed limit parameter type from Long to Int |
| NativeGlobalLimitExec.scala | Added offset parameter to case class |
| NativeCollectLimitExec.scala | Added offset parameter to case class |
| ShimsImpl.scala | Implemented getLimitAndOffset methods for Spark 3.4/3.5; added effectiveLimit helper |
| limit_exec.rs | Added skip field; implemented execute_limit_with_skip function; updated display and statistics |
| from_proto.rs | Updated deserialization to handle offset in limit and sort operations |
| auron.proto | Changed FetchLimit and LimitExecNode from uint64 to uint32; added offset field |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Arc::new(SortExec::new(input, exprs, limit_for_sort)); | ||
|
|
||
| if offset > 0 { | ||
| plan = Arc::new(LimitExec::new(plan, usize::MAX, offset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this nested execution plan will break metric nodes tree. i suggest to add a new offset field to SortExec and handle offset logics in SortExec::execute()
0797cff to
4f4d163
Compare
Which issue does this PR close?
Closes #1739
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
How was this patch tested?