Skip to content

Conversation

@yew1eb
Copy link
Contributor

@yew1eb yew1eb commented Dec 12, 2025

Which issue does this PR close?

Closes #1746

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

@github-actions github-actions bot added the spark label Dec 12, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces NativeTakeOrderedAndProjectExec to fuse the TakeOrdered and Project operations into a single native operator, improving performance by eliminating an intermediate operator. Previously, TakeOrderedAndProjectExec was converted to separate NativeTakeOrderedExec and NativeProjectExec operators, but now they're combined into a single fused operator.

Key changes:

  • Renamed base classes from NativeTakeOrderedBase to NativeTakeOrderedAndProjectBase and added projectList parameter to support projection
  • Updated the converter to pass projectList directly to the native operator instead of wrapping with a separate ProjectExec
  • Modified both executeCollect() and doExecuteNative() methods to apply projection when needed

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
NativeTakeOrderedAndProjectBase.scala Renamed base classes and added projection support; applies projection in both executeCollect and doExecuteNative paths
Shims.scala Updated interface method signatures to include projectList parameter and return renamed types
AuronConverters.scala Simplified conversion logic by passing projectList directly instead of creating separate ProjectExec
NativeTakeOrderedAndProjectExec.scala Updated concrete implementation with new projectList parameter
NativePartialTakeOrderedExec.scala Updated to extend renamed base class
ShimsImpl.scala Updated implementation to match new interface signatures
AuronExecSuite.scala Added test coverage for both executeCollect and doExecuteNative paths
Comments suppressed due to low confidence (3)

spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedAndProjectBase.scala:170

  • The class name NativePartialTakeOrderedAndProjectBase is misleading because this partial execution operator does not actually apply projection. The projection only happens in the final stage (in NativeTakeOrderedAndProjectBase). Consider keeping the original name NativePartialTakeOrderedBase or clarifying that this is just a partial step of the TakeOrderedAndProject operation.
    spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedAndProjectBase.scala:218
  • The friendlyName "PartialTakeOrderedAndProject" is misleading because this partial execution step does not apply projection. The projection only happens in the final stage. Consider using "PartialTakeOrdered" to better reflect what this stage actually does.
    spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedAndProjectBase.scala:131
  • The early return at line 131 does not apply the projection. When partial.outputPartitioning.numPartitions <= 1, the method returns the partial result directly without applying the project transformation. This means that if projectList != child.output, the returned data will have incorrect columns. The early return should also handle projection similar to how it's done later in the method (lines 158-164).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yew1eb
Copy link
Contributor Author

yew1eb commented Dec 16, 2025

cc @richox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove unnecessary NativeProjectExec in native take-ordered + projection

1 participant