-
Notifications
You must be signed in to change notification settings - Fork 198
[AURON #1745] Introduce auron-spark-tests submodule for correctness testing #1752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
17c2cf2 to
f55d2f8
Compare
|
@ShreyeshArangath @cxzl25 Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new auron-spark-tests submodule to establish a comprehensive correctness testing framework for the Auron project. The framework enables systematic testing by extending Spark's native test suites while providing mechanisms to selectively enable/disable tests based on known issues or work-in-progress features. The implementation follows a modular design with a common module containing shared test infrastructure and version-specific modules (starting with spark33) for Spark version-specific test suites.
Key changes:
- Created test infrastructure framework with enable/disable capabilities for test suites and individual test cases
- Added
AuronStringFunctionsSuiteas the first correctness test suite for Spark 3.3, with one known issue (substring function) disabled - Integrated the new test module into the build system and CI pipeline with a new
--sparktestsoption
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Added dependency management entries for test-related Spark artifacts and scalatestplus-scalacheck; added spark-tests profile |
| auron-spark-tests/pom.xml | Parent POM for spark tests with common dependencies and spark-3.3 profile |
| auron-spark-tests/common/pom.xml | Common test utilities module with test-jar packaging for reuse |
| auron-spark-tests/spark33/pom.xml | Spark 3.3-specific test module configuration |
| auron-spark-tests/common/src/test/scala/org/apache/auron/utils/SparkTestSettings.scala | Core framework for enabling/disabling test suites and individual tests |
| auron-spark-tests/common/src/test/scala/org/apache/auron/utils/SQLQueryTestSettings.scala | Interface for SQL query test configuration |
| auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkTestsBase.scala | Base trait for Spark tests with configuration and test filtering |
| auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkTestsSharedSessionBase.scala | Shared Spark session setup and execution plan utilities |
| auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkQueryTestsBase.scala | Base trait for query tests with custom result checking logic |
| auron-spark-tests/common/src/test/scala/org/apache/spark/sql/SparkExpressionTestsBase.scala | Base trait for expression tests with native backend validation |
| auron-spark-tests/common/src/test/scala/org/apache/spark/utils/DebuggableThreadUtils.scala | Utility for better error logging in parallel test execution |
| auron-spark-tests/spark33/src/test/scala/org/apache/auron/utils/AuronSparkTestSettings.scala | Spark 3.3-specific test settings with substring function test disabled |
| auron-spark-tests/spark33/src/test/scala/org/apache/spark/sql/AuronStringFunctionsSuite.scala | String functions correctness test suite extending Spark's suite |
| auron-spark-tests/common/src/test/resources/log4j.properties | Log4j configuration for test execution |
| auron-spark-tests/common/src/test/resources/log4j2.properties | Log4j2 configuration for test execution |
| auron-build.sh | Added --sparktests option to enable spark tests during build |
| .github/workflows/tpcds.yml | Enabled spark tests for Spark 3.3 test job |
| .github/workflows/tpcds-reusable.yml | Added sparktests parameter handling in reusable workflow |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * @param actualAnswer | ||
| * the actual result in a [[Row]]. | ||
| * @param expectedAnswer | ||
| * the expected result in a[[Row]]. |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space in scaladoc reference: There should be a space between "a" and the scaladoc link "[[Row]]" to improve readability.
| * the expected result in a[[Row]]. | |
| * the expected result in a [[Row]]. |
| /** | ||
| * Sort map data by key and return the sorted key array and value array. | ||
| * | ||
| * @param input | ||
| * input map data. | ||
| * @param kt | ||
| * key type. | ||
| * @param vt | ||
| * value type. | ||
| * @return | ||
| * the sorted key array and value array. | ||
| */ | ||
| private def getSortedArrays( | ||
| input: MapData, | ||
| kt: DataType, | ||
| vt: DataType): (ArrayData, ArrayData) = { | ||
| val keyArray = input.keyArray().toArray[Any](kt) | ||
| val valueArray = input.valueArray().toArray[Any](vt) | ||
| val newMap = (keyArray.zip(valueArray)).toMap | ||
| val sortedMap = mutable.SortedMap(newMap.toSeq: _*)(TypeUtils.getInterpretedOrdering(kt)) | ||
| (new GenericArrayData(sortedMap.keys.toArray), new GenericArrayData(sortedMap.values.toArray)) | ||
| } |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing scaladoc for parameters: The method getSortedArrays has parameter documentation in the scaladoc, but the parameters kt and vt descriptions are incomplete - they just say "key type" and "value type" without explaining what DataType they should be or any constraints.
| override def getSQLQueryTestSettings = new SQLQueryTestSettings { | ||
| override def getResourceFilePath: String = ??? | ||
|
|
||
| override def getSupportedSQLQueryTests: Set[String] = ??? | ||
|
|
||
| override def getOverwriteSQLQueryTests: Set[String] = ??? | ||
| } |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unimplemented methods with ??? operator: The getSQLQueryTestSettings method returns a SQLQueryTestSettings instance with all methods throwing NotImplementedError (via ???). While the comment states "Will be implemented in the future", this creates a risk if any code tries to use these methods. Consider either returning None/null for optional functionality, or adding runtime checks to prevent accidental usage until implementation is complete.
| override def getSQLQueryTestSettings = new SQLQueryTestSettings { | |
| override def getResourceFilePath: String = ??? | |
| override def getSupportedSQLQueryTests: Set[String] = ??? | |
| override def getOverwriteSQLQueryTests: Set[String] = ??? | |
| } | |
| override def getSQLQueryTestSettings = throw new NotImplementedError("getSQLQueryTestSettings is not implemented yet") |
| <dependency> | ||
| <groupId>org.apache.spark</groupId> | ||
| <artifactId>spark-catalyst_${scalaVersion}</artifactId> | ||
| <version>${sparkVersion}</version> | ||
| </dependency> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate dependency declaration: spark-catalyst dependency (without test-jar type) is being added here but likely already exists in the dependencyManagement section before this addition. This creates two entries for the same artifact with different configurations (one without type specification on lines 245-249, and one with type=test-jar on lines 250-256), which could lead to confusion about which version/scope is actually used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both.
| val basePathDir = new File(basePath) | ||
| if (basePathDir.exists()) { | ||
| FileUtils.forceDelete(basePathDir) | ||
| } | ||
| FileUtils.forceMkdir(basePathDir) | ||
| FileUtils.forceMkdir(new File(warehouse)) | ||
| FileUtils.forceMkdir(new File(metaStorePathAbsolute)) | ||
| } |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing scaladoc for public method: The prepareWorkDir method is public but lacks documentation explaining what it does, when it should be called, and any side effects (like deleting existing directories). This is especially important since it performs destructive operations like force-deleting directories.
| def shouldNotFallback(): Unit = { | ||
| // Do nothing for now since Auron has a lot of fallbacks. |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder method needs documentation: The shouldNotFallback method is intentionally empty ("Do nothing for now since Auron has a lot of fallbacks"), but it's unclear what the intended behavior should be in the future or when this should be implemented. Consider adding a TODO comment or more detailed explanation about the future implementation plan.
| def shouldNotFallback(): Unit = { | |
| // Do nothing for now since Auron has a lot of fallbacks. | |
| /** | |
| * Placeholder for future fallback checks. | |
| * | |
| * TODO: Implement logic to verify that no unexpected fallbacks occur during expression evaluation. | |
| * Currently, this method is intentionally left empty because the Auron engine has many | |
| * legitimate fallback cases that are not yet fully handled. Once fallback handling is | |
| * stabilized and the expected cases are well defined, implement assertions or checks here | |
| * to ensure that only allowed fallbacks occur. | |
| */ | |
| def shouldNotFallback(): Unit = { | |
| // Intentionally left blank for now. |
| def convertInternalRowToDataFrame(inputRow: InternalRow): DataFrame = { | ||
| val structFieldSeq = new ArrayBuffer[StructField]() | ||
| val values = inputRow match { | ||
| case genericInternalRow: GenericInternalRow => | ||
| genericInternalRow.values | ||
| case _ => throw new UnsupportedOperationException("Unsupported InternalRow.") | ||
| } | ||
|
|
||
| values.foreach { | ||
| case boolean: java.lang.Boolean => | ||
| structFieldSeq.append(StructField("bool", BooleanType, boolean == null)) | ||
| case byte: java.lang.Byte => | ||
| structFieldSeq.append(StructField("i8", ByteType, byte == null)) | ||
| case short: java.lang.Short => | ||
| structFieldSeq.append(StructField("i16", ShortType, short == null)) | ||
| case integer: java.lang.Integer => | ||
| structFieldSeq.append(StructField("i32", IntegerType, integer == null)) | ||
| case long: java.lang.Long => | ||
| structFieldSeq.append(StructField("i64", LongType, long == null)) | ||
| case float: java.lang.Float => | ||
| structFieldSeq.append(StructField("fp32", FloatType, float == null)) | ||
| case double: java.lang.Double => | ||
| structFieldSeq.append(StructField("fp64", DoubleType, double == null)) | ||
| case utf8String: UTF8String => | ||
| structFieldSeq.append(StructField("str", StringType, utf8String == null)) | ||
| case byteArr: Array[Byte] => | ||
| structFieldSeq.append(StructField("vbin", BinaryType, byteArr == null)) | ||
| case decimal: Decimal => | ||
| structFieldSeq.append( | ||
| StructField("dec", DecimalType(decimal.precision, decimal.scale), decimal == null)) | ||
| case _ => | ||
| // for null | ||
| structFieldSeq.append(StructField("n", IntegerType, nullable = true)) | ||
| } | ||
|
|
||
| _spark.internalCreateDataFrame( | ||
| _spark.sparkContext.parallelize(Seq(inputRow)), | ||
| StructType(structFieldSeq.toSeq)) | ||
| } |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for public method: The convertInternalRowToDataFrame method is public but lacks scaladoc explaining its purpose, the structure of the resulting DataFrame, and the limitations (as indicated by the UnsupportedOperationException for non-GenericInternalRow types).
auron-spark-tests/spark33/pom.xml
Outdated
| <artifactId>auron-spark-tests-common</artifactId> | ||
| <version>${project.version}</version> | ||
| <type>test-jar</type> | ||
| <scope>compile</scope> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unusual scope for test-jar dependency: The auron-spark-tests-common dependency uses type=test-jar but scope=compile instead of the typical scope=test. This is unusual because test-jars are typically used in test scope. While this might be intentional to make test utilities available at compile time, it should be documented why this non-standard configuration is needed.
| <scope>compile</scope> | |
| <scope>test</scope> |
auron-spark-tests/spark33/pom.xml
Outdated
| <dependency> | ||
| <groupId>org.scalatestplus</groupId> | ||
| <artifactId>scalatestplus-scalacheck_${scalaVersion}</artifactId> | ||
| <version>3.1.0.0-RC2</version> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded version instead of dependency management: The scalatestplus-scalacheck dependency version 3.1.0.0-RC2 is hardcoded here, which duplicates the version already defined in the root pom.xml (lines 232-237). This should inherit from dependencyManagement without specifying the version to ensure consistency and easier version updates across the project.
| <version>3.1.0.0-RC2</version> |
| <dependency> | ||
| <groupId>org.scalatestplus</groupId> | ||
| <artifactId>scalatestplus-scalacheck_${scalaVersion}</artifactId> | ||
| <version>3.1.0.0-RC2</version> |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RC (Release Candidate) version in production: The dependency scalatestplus-scalacheck is using version 3.1.0.0-RC2, which is a release candidate version. For production code and test infrastructure, it's better to use stable release versions rather than RC versions, as they may have bugs or compatibility issues that were fixed in the final release.
| <version>3.1.0.0-RC2</version> | |
| <version>3.1.0.0</version> |
c5675fc to
19eccaa
Compare
|
Update on this PR:
@cxzl25 Can you help review this PR? This is the base/first PR for this AIP |
Which issue does this PR close?
First PR for #1745 , establish the skeleton for correctness testing
Rationale for this change
N/A.
What changes are included in this PR?
We introduced submodule
auron-spark-testsfor correctness testing.It has submodules:
commonandspark33.commoncontains the helper classes for the whole correctnesstesting: setup the native engine enabled spark conf, enable/disable
some test cases etc.
spark33is the submodule that contains test cases for Spark 3.3, inthis PR we included one test suite: AuronStringFunctionsSuite for demo
purpose.
Each sparkxx submodule contains a
AuronSparkTestSettingswhich controlswhat test suites/test cases to enable/disable:
We actually do find a bug for
substringfunction, so I disabled correspondingtest here.
In different Spark version's module, we might have different enables/disables.
Are there any user-facing changes?
No.
How was this patch tested?
Added UTs.