Skip to content

Conversation

@xumingming
Copy link
Contributor

Which issue does this PR close?

First PR for #1745 , establish the skeleton for correctness testing

Rationale for this change

N/A.

What changes are included in this PR?

We introduced submodule auron-spark-tests for correctness testing.
It has submodules: common and spark33.

  • common contains the helper classes for the whole correctness
    testing: setup the native engine enabled spark conf, enable/disable
    some test cases etc.
  • spark33 is the submodule that contains test cases for Spark 3.3, in
    this PR we included one test suite: AuronStringFunctionsSuite for demo
    purpose.

Each sparkxx submodule contains a AuronSparkTestSettings which controls
what test suites/test cases to enable/disable:

  enableSuite[AuronStringFunctionsSuite]
    // See https://github.com/apache/auron/issues/1724
    .exclude("string / binary substring function")

We actually do find a bug for substring function, so I disabled corresponding
test here.

In different Spark version's module, we might have different enables/disables.

Are there any user-facing changes?

No.

How was this patch tested?

Added UTs.

@xumingming
Copy link
Contributor Author

@ShreyeshArangath @cxzl25 Can you take a look?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new auron-spark-tests submodule to establish a comprehensive correctness testing framework for the Auron project. The framework enables systematic testing by extending Spark's native test suites while providing mechanisms to selectively enable/disable tests based on known issues or work-in-progress features. The implementation follows a modular design with a common module containing shared test infrastructure and version-specific modules (starting with spark33) for Spark version-specific test suites.

Key changes:

  • Created test infrastructure framework with enable/disable capabilities for test suites and individual test cases
  • Added AuronStringFunctionsSuite as the first correctness test suite for Spark 3.3, with one known issue (substring function) disabled
  • Integrated the new test module into the build system and CI pipeline with a new --sparktests option

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
pom.xml Added dependency management entries for test-related Spark artifacts and scalatestplus-scalacheck; added spark-tests profile
auron-spark-tests/pom.xml Parent POM for spark tests with common dependencies and spark-3.3 profile
auron-spark-tests/common/pom.xml Common test utilities module with test-jar packaging for reuse
auron-spark-tests/spark33/pom.xml Spark 3.3-specific test module configuration
auron-spark-tests/common/src/test/scala/org/apache/auron/utils/SparkTestSettings.scala Core framework for enabling/disabling test suites and individual tests
auron-spark-tests/common/src/test/scala/org/apache/auron/utils/SQLQueryTestSettings.scala Interface for SQL query test configuration
auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkTestsBase.scala Base trait for Spark tests with configuration and test filtering
auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkTestsSharedSessionBase.scala Shared Spark session setup and execution plan utilities
auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkQueryTestsBase.scala Base trait for query tests with custom result checking logic
auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkExpressionTestsBase.scala Base trait for expression tests with native backend validation
auron-spark-tests/common/src/test/scala/org/apache/spark/utils/DebuggableThreadUtils.scala Utility for better error logging in parallel test execution
auron-spark-tests/spark33/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala Spark 3.3-specific test settings with substring function test disabled
auron-spark-tests/spark33/src/test/scala/org/apache/spark/sql/AuronStringFunctionsSuite.scala String functions correctness test suite extending Spark's suite
auron-spark-tests/common/src/test/resources/log4j.properties Log4j configuration for test execution
auron-spark-tests/common/src/test/resources/log4j2.properties Log4j2 configuration for test execution
auron-build.sh Added --sparktests option to enable spark tests during build
.github/workflows/tpcds.yml Enabled spark tests for Spark 3.3 test job
.github/workflows/tpcds-reusable.yml Added sparktests parameter handling in reusable workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* @param actualAnswer
* the actual result in a [[Row]].
* @param expectedAnswer
* the expected result in a[[Row]].
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space in scaladoc reference: There should be a space between "a" and the scaladoc link "[[Row]]" to improve readability.

Suggested change
* the expected result in a[[Row]].
* the expected result in a [[Row]].

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +203
/**
* Sort map data by key and return the sorted key array and value array.
*
* @param input
* input map data.
* @param kt
* key type.
* @param vt
* value type.
* @return
* the sorted key array and value array.
*/
private def getSortedArrays(
input: MapData,
kt: DataType,
vt: DataType): (ArrayData, ArrayData) = {
val keyArray = input.keyArray().toArray[Any](kt)
val valueArray = input.valueArray().toArray[Any](vt)
val newMap = (keyArray.zip(valueArray)).toMap
val sortedMap = mutable.SortedMap(newMap.toSeq: _*)(TypeUtils.getInterpretedOrdering(kt))
(new GenericArrayData(sortedMap.keys.toArray), new GenericArrayData(sortedMap.values.toArray))
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing scaladoc for parameters: The method getSortedArrays has parameter documentation in the scaladoc, but the parameters kt and vt descriptions are incomplete - they just say "key type" and "value type" without explaining what DataType they should be or any constraints.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +38
override def getSQLQueryTestSettings = new SQLQueryTestSettings {
override def getResourceFilePath: String = ???

override def getSupportedSQLQueryTests: Set[String] = ???

override def getOverwriteSQLQueryTests: Set[String] = ???
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unimplemented methods with ??? operator: The getSQLQueryTestSettings method returns a SQLQueryTestSettings instance with all methods throwing NotImplementedError (via ???). While the comment states "Will be implemented in the future", this creates a risk if any code tries to use these methods. Consider either returning None/null for optional functionality, or adding runtime checks to prevent accidental usage until implementation is complete.

Suggested change
override def getSQLQueryTestSettings = new SQLQueryTestSettings {
override def getResourceFilePath: String = ???
override def getSupportedSQLQueryTests: Set[String] = ???
override def getOverwriteSQLQueryTests: Set[String] = ???
}
override def getSQLQueryTestSettings = throw new NotImplementedError("getSQLQueryTestSettings is not implemented yet")

Copilot uses AI. Check for mistakes.
Comment on lines +245 to +249
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_${scalaVersion}</artifactId>
<version>${sparkVersion}</version>
</dependency>
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate dependency declaration: spark-catalyst dependency (without test-jar type) is being added here but likely already exists in the dependencyManagement section before this addition. This creates two entries for the same artifact with different configurations (one without type specification on lines 245-249, and one with type=test-jar on lines 250-256), which could lead to confusion about which version/scope is actually used.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both.

Comment on lines +37 to +44
val basePathDir = new File(basePath)
if (basePathDir.exists()) {
FileUtils.forceDelete(basePathDir)
}
FileUtils.forceMkdir(basePathDir)
FileUtils.forceMkdir(new File(warehouse))
FileUtils.forceMkdir(new File(metaStorePathAbsolute))
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing scaladoc for public method: The prepareWorkDir method is public but lacks documentation explaining what it does, when it should be called, and any side effects (like deleting existing directories). This is especially important since it performs destructive operations like force-deleting directories.

Copilot uses AI. Check for mistakes.
Comment on lines 275 to 276
def shouldNotFallback(): Unit = {
// Do nothing for now since Auron has a lot of fallbacks.
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder method needs documentation: The shouldNotFallback method is intentionally empty ("Do nothing for now since Auron has a lot of fallbacks"), but it's unclear what the intended behavior should be in the future or when this should be implemented. Consider adding a TODO comment or more detailed explanation about the future implementation plan.

Suggested change
def shouldNotFallback(): Unit = {
// Do nothing for now since Auron has a lot of fallbacks.
/**
* Placeholder for future fallback checks.
*
* TODO: Implement logic to verify that no unexpected fallbacks occur during expression evaluation.
* Currently, this method is intentionally left empty because the Auron engine has many
* legitimate fallback cases that are not yet fully handled. Once fallback handling is
* stabilized and the expected cases are well defined, implement assertions or checks here
* to ensure that only allowed fallbacks occur.
*/
def shouldNotFallback(): Unit = {
// Intentionally left blank for now.

Copilot uses AI. Check for mistakes.
Comment on lines 300 to 338
def convertInternalRowToDataFrame(inputRow: InternalRow): DataFrame = {
val structFieldSeq = new ArrayBuffer[StructField]()
val values = inputRow match {
case genericInternalRow: GenericInternalRow =>
genericInternalRow.values
case _ => throw new UnsupportedOperationException("Unsupported InternalRow.")
}

values.foreach {
case boolean: java.lang.Boolean =>
structFieldSeq.append(StructField("bool", BooleanType, boolean == null))
case byte: java.lang.Byte =>
structFieldSeq.append(StructField("i8", ByteType, byte == null))
case short: java.lang.Short =>
structFieldSeq.append(StructField("i16", ShortType, short == null))
case integer: java.lang.Integer =>
structFieldSeq.append(StructField("i32", IntegerType, integer == null))
case long: java.lang.Long =>
structFieldSeq.append(StructField("i64", LongType, long == null))
case float: java.lang.Float =>
structFieldSeq.append(StructField("fp32", FloatType, float == null))
case double: java.lang.Double =>
structFieldSeq.append(StructField("fp64", DoubleType, double == null))
case utf8String: UTF8String =>
structFieldSeq.append(StructField("str", StringType, utf8String == null))
case byteArr: Array[Byte] =>
structFieldSeq.append(StructField("vbin", BinaryType, byteArr == null))
case decimal: Decimal =>
structFieldSeq.append(
StructField("dec", DecimalType(decimal.precision, decimal.scale), decimal == null))
case _ =>
// for null
structFieldSeq.append(StructField("n", IntegerType, nullable = true))
}

_spark.internalCreateDataFrame(
_spark.sparkContext.parallelize(Seq(inputRow)),
StructType(structFieldSeq.toSeq))
}
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for public method: The convertInternalRowToDataFrame method is public but lacks scaladoc explaining its purpose, the structure of the resulting DataFrame, and the limitations (as indicated by the UnsupportedOperationException for non-GenericInternalRow types).

Copilot uses AI. Check for mistakes.
<artifactId>auron-spark-tests-common</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>compile</scope>
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual scope for test-jar dependency: The auron-spark-tests-common dependency uses type=test-jar but scope=compile instead of the typical scope=test. This is unusual because test-jars are typically used in test scope. While this might be intentional to make test utilities available at compile time, it should be documented why this non-standard configuration is needed.

Suggested change
<scope>compile</scope>
<scope>test</scope>

Copilot uses AI. Check for mistakes.
<dependency>
<groupId>org.scalatestplus</groupId>
<artifactId>scalatestplus-scalacheck_${scalaVersion}</artifactId>
<version>3.1.0.0-RC2</version>
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded version instead of dependency management: The scalatestplus-scalacheck dependency version 3.1.0.0-RC2 is hardcoded here, which duplicates the version already defined in the root pom.xml (lines 232-237). This should inherit from dependencyManagement without specifying the version to ensure consistency and easier version updates across the project.

Suggested change
<version>3.1.0.0-RC2</version>

Copilot uses AI. Check for mistakes.
<dependency>
<groupId>org.scalatestplus</groupId>
<artifactId>scalatestplus-scalacheck_${scalaVersion}</artifactId>
<version>3.1.0.0-RC2</version>
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RC (Release Candidate) version in production: The dependency scalatestplus-scalacheck is using version 3.1.0.0-RC2, which is a release candidate version. For production code and test infrastructure, it's better to use stable release versions rather than RC versions, as they may have bugs or compatibility issues that were fixed in the final release.

Suggested change
<version>3.1.0.0-RC2</version>
<version>3.1.0.0</version>

Copilot uses AI. Check for mistakes.
@xumingming
Copy link
Contributor Author

Update on this PR:

@cxzl25 Can you help review this PR? This is the base/first PR for this AIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant