Skip to content

Conversation

@yew1eb
Copy link
Contributor

@yew1eb yew1eb commented Dec 11, 2025

Which issue does this PR close?

Closes #1739

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

@yew1eb yew1eb force-pushed the support_limit_with_offset branch 15 times, most recently from 4d7845c to 4e4eab0 Compare December 15, 2025 17:43
@yew1eb
Copy link
Contributor Author

yew1eb commented Dec 16, 2025

cc @richox

@yew1eb yew1eb force-pushed the support_limit_with_offset branch 4 times, most recently from c1b22ac to 75db692 Compare December 17, 2025 14:48
---------

Co-authored-by: cxzl25 <3898450+cxzl25@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for SQL LIMIT with OFFSET functionality to the Auron query engine, implementing the feature for Spark 3.4 and later versions.

Key Changes

  • Extended limit operations to support offset parameter, allowing queries like SELECT * FROM table LIMIT 5 OFFSET 2
  • Changed parameter types from Long to Int for consistency with Spark's internal representations
  • Updated protobuf schema to include offset field with reduced integer size (uint64uint32)

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
NativeTakeOrderedBase.scala Added offset parameter; updated executeCollect to handle offset by dropping rows after sorted merge
NativeLocalLimitBase.scala Changed limit parameter type from Long to Int
NativeGlobalLimitBase.scala Added offset parameter and updated native execution to pass offset to protobuf
NativeCollectLimitBase.scala Added offset parameter; updated executeCollect and native execution to handle offset
Shims.scala Added getLimitAndOffset methods for Spark 3.4+ exec nodes; updated factory method signatures
AuronConverters.scala Updated converters to extract and pass limit/offset pairs from Spark exec nodes
AuronExecSuite.scala Added comprehensive tests for limit with offset across different exec types
NativeTakeOrderedExec.scala Updated case class to include offset parameter
NativePartialTakeOrderedExec.scala Changed limit parameter type from Long to Int
NativeLocalLimitExec.scala Changed limit parameter type from Long to Int
NativeGlobalLimitExec.scala Added offset parameter to case class
NativeCollectLimitExec.scala Added offset parameter to case class
ShimsImpl.scala Implemented getLimitAndOffset methods for Spark 3.4/3.5; added effectiveLimit helper
limit_exec.rs Added skip field; implemented execute_limit_with_skip function; updated display and statistics
from_proto.rs Updated deserialization to handle offset in limit and sort operations
auron.proto Changed FetchLimit and LimitExecNode from uint64 to uint32; added offset field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Arc::new(SortExec::new(input, exprs, limit_for_sort));

if offset > 0 {
plan = Arc::new(LimitExec::new(plan, usize::MAX, offset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this nested execution plan will break metric nodes tree. i suggest to add a new offset field to SortExec and handle offset logics in SortExec::execute()

@yew1eb yew1eb force-pushed the support_limit_with_offset branch from 0797cff to 4f4d163 Compare December 23, 2025 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: incorrectly ignores OFFSET clause (Spark 3.4+)

2 participants