Skip to content

Conversation

@GaryShen2008
Copy link
Collaborator

Add a parameter USE_PREBUILT_CUDF(default: ON) to extract the libcudf.so and others from rapids-4-spark jar file.
So we don't need to build native cuDF from the source code everytime, which requires about 1-2 hours.
You can still use -DUSE_PREBUILT_CUDF=OFF to build native cuDF.

Close #576

Extract the libcudf.so from rapids-4-spark jar to accelerate the
building

Signed-off-by: Gary Shen <gashen@nvidia.com>
Signed-off-by: Gary Shen <gashen@nvidia.com>
Signed-off-by: Gary Shen <gashen@nvidia.com>
Copilot AI review requested due to automatic review settings December 25, 2025 09:05
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 25, 2025

Greptile Summary

This PR significantly improves the UDF examples build process by extracting prebuilt libcudf.so from the rapids-4-spark jar instead of building cuDF from source, reducing build time by approximately 2 hours.

Key Changes:

  • Added USE_PREBUILT_CUDF parameter (default: ON) to control build mode
  • Created extract-cudf-libs.sh script to extract native libraries from jar
  • Created clone-cudf-repo.sh script to obtain cuDF headers via shallow clone
  • Updated CMakeLists.txt with intelligent auto-fallback logic when prebuilt libs unavailable
  • Enhanced pom.xml with Maven phases to handle jar extraction and dependency management
  • Added comprehensive documentation in README explaining both fast and source build options

Implementation Quality:

  • Proper error handling with detailed diagnostics throughout
  • Intelligent fallback to source build when rapids-4-spark jar is unavailable
  • Distinguishes between user-explicit settings vs auto-defaults to prevent surprising behavior
  • Cross-platform path handling in Maven Ant scripts
  • Multiple jar naming pattern support for both release and SNAPSHOT versions

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - it adds an optional fast-build feature while preserving existing source-build functionality
  • The implementation shows good software engineering practices with proper error handling, auto-fallback mechanisms, and comprehensive documentation. The complex logic is well-structured and handles edge cases appropriately. Minor concerns exist around the complexity of the Ant script logic and potential platform-specific behavior, but the fallback mechanisms mitigate these risks.
  • The pom.xml file has the most complex logic with multiple jar lookup patterns - ensure thorough testing across SNAPSHOT and release versions

Important Files Changed

Filename Overview
examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh New bash script that extracts prebuilt libcudf.so from rapids-4-spark jar and clones cuDF headers. Good error handling with detailed diagnostics and fallback behavior.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt Adds USE_PREBUILT_CUDF option with auto-fallback logic. Properly distinguishes user-set vs default values, with comprehensive error messages. Complex logic but well-structured.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml Adds Maven build phases for extracting cuDF libraries. Complex Ant script logic handles multiple jar naming patterns. Properly creates USE_SOURCE_BUILD marker when jar not found.

Sequence Diagram

sequenceDiagram
    participant User
    participant Maven
    participant AntScript as Maven Ant Script
    participant CMake
    participant JarFile as rapids-4-spark.jar
    participant NativeDeps as target/native-deps
    participant CudfRepo as target/cudf-repo

    User->>Maven: mvn clean package -Pudf-native-examples
    
    alt USE_PREBUILT_CUDF=ON (default)
        Maven->>Maven: Try download rapids-4-spark jar<br/>(with/without classifier)
        
        Maven->>AntScript: generate-sources phase
        AntScript->>AntScript: Check USE_PREBUILT_CUDF flag
        
        alt rapids-4-spark jar found
            AntScript->>JarFile: Extract libcudf.so & libnvcomp.so
            JarFile-->>NativeDeps: Libraries extracted
            AntScript->>AntScript: Execute clone-cudf-repo.sh
            AntScript->>CudfRepo: Clone cuDF repo (shallow, headers only)
            CudfRepo-->>AntScript: Headers available
        else rapids-4-spark jar NOT found
            AntScript->>AntScript: Create USE_SOURCE_BUILD marker
            AntScript-->>Maven: Will fallback to source build
        end
        
        Maven->>CMake: compile phase with -DUSE_PREBUILT_CUDF=ON
        CMake->>CMake: Check USE_SOURCE_BUILD marker
        
        alt Prebuilt libraries available
            CMake->>NativeDeps: Find libcudf.so
            CMake->>CudfRepo: Get cuDF headers
            CMake->>CMake: Use rapids-cpm for CCCL & RMM
            CMake->>CMake: Create cudf_imported target
            CMake->>CMake: Build only UDF native code (fast!)
        else Prebuilt NOT available
            CMake->>CMake: Auto-fallback to source build
            CMake->>CMake: Build cuDF from source (30+ min)
        end
        
    else USE_PREBUILT_CUDF=OFF
        Maven->>CMake: compile phase with -DUSE_PREBUILT_CUDF=OFF
        CMake->>CMake: Skip prebuilt checks
        CMake->>CMake: Build cuDF from source (30+ min)
    end
    
    CMake-->>Maven: libudfexamplesjni.so built
    Maven->>Maven: Package into final jar
    Maven-->>User: Build complete!
Loading

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a fast build mode for UDF native examples by enabling the use of prebuilt cuDF libraries extracted from the rapids-4-spark jar, significantly reducing build time from approximately 2 hours to just a few minutes.

Key Changes:

  • Introduces USE_PREBUILT_CUDF option (default: ON) to toggle between prebuilt and source build modes
  • Implements automatic extraction of libcudf.so and related libraries from rapids-4-spark jar during Maven build
  • Adds automatic fallback to source build when prebuilt libraries are unavailable

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt Adds CMake logic to detect and use prebuilt cuDF libraries, with fallback to source build
examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml Implements Maven build phases for downloading rapids-4-spark jar, extracting native libraries, and cloning cuDF headers
examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh Provides standalone script for manually extracting cuDF dependencies from rapids-4-spark jar
examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md Updates documentation with instructions for both fast build (prebuilt) and traditional source build options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GaryShen2008 GaryShen2008 marked this pull request as draft December 25, 2025 11:40
Signed-off-by: Gary Shen <gashen@nvidia.com>
Signed-off-by: Gary Shen <gashen@nvidia.com>
@GaryShen2008 GaryShen2008 marked this pull request as ready for review December 25, 2025 14:04
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 81 (link)

    logic: inconsistent library extraction: bash script extracts libnvcomp.so but not librmm.so, while pom.xml extracts librmm.so but not libnvcomp.so (line 333). should extract the same libraries in both places.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 333-334 (link)

    style: Extracting librmm.so appears unnecessary since CMake fetches RMM via rapids_cpm_rmm() and only uses its headers, not the prebuilt library. The CMake code on line 231 (CMakeLists.txt:231) links against rmm::rmm target (headers only) rather than the extracted librmm.so.

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 81 (link)

    style: Same as pom.xml: extracting librmm.so is unnecessary since it's not used by the build process.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 36-41 (link)

    style: The rapids-cmake-dir property is never set by include(rapids-cmake), so this always uses the hardcoded fallback path ${CMAKE_BINARY_DIR}/_deps/rapids-cmake-src. The get_property() call is ineffective.

4 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 232 (link)

    logic: SKIP_EXTRACTION marker file is created but never checked or used. Only USE_SOURCE_BUILD marker is checked in CMakeLists.txt:103

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 362 (link)

    syntax: The git clone command uses ${cudf.git.branch} in a bash string without proper quoting, which could cause issues if the branch name contains special shell characters

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 355-358 (link)

    style: git clone operation could fail silently in some edge cases. The set -e ensures early exit, but if the directory exists but git fetch/pull fails (e.g., network issues), the build continues with potentially stale headers.

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 103-109 (link)

    style: Marker file check creates tight coupling between Maven and CMake. If Maven creates the marker but CMake cache already has USE_PREBUILT_CUDF=ON, the FORCE override could surprise users who explicitly set the option.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 238-243 (link)

    style: Check that Maven property paths work cross-platform. On Windows, ${settings.localRepository} may use backslashes while the rest uses forward slashes, potentially breaking path construction.

4 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 369-413 (link)

    logic: inline bash script in Maven build could fail on Windows

    This <exec executable="bash"> block will fail on Windows systems that don't have bash installed. While the path normalization earlier in the file accounts for cross-platform compatibility, this bash script execution doesn't.

    Alternatively, check that this is only expected to run on Linux/Unix systems and document this requirement.

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 115-137 (link)

    logic: user explicitly requesting prebuilt mode when jar is missing could be handled more gracefully

    When a user explicitly sets -DUSE_PREBUILT_CUDF=ON but the jar is missing, the warning says "The build will likely FAIL" but then continues anyway. This leads to confusing build failures later.

    Consider failing immediately when the user's explicit intent cannot be fulfilled:

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 380 (link)

    style: This extremely long single-line bash command (380+ characters) is very hard to maintain and debug. While the comment explains why it needs to be on one line, consider these alternatives:

    1. Move this git logic to extract-cudf-libs.sh and call that script from Maven
    2. Use a heredoc with proper escaping
    3. Create a separate shell script file

    The current approach makes the pom.xml harder to read and the bash script is difficult to validate or test independently.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 380 (link)

    style: Single-line bash script is hard to read and maintain. The complex conditional logic with multiple branches makes debugging difficult if git operations fail.

    Consider extracting to separate .sh file (similar to extract-cudf-libs.sh) or using multi-line XML CDATA section for better readability.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 205 (link)

    style: Bash command is extremely long (over 1000 chars) and complex. Consider extracting this logic to extract-cudf-libs.sh (which already exists) and calling that script instead for better maintainability.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/clone-cudf-repo.sh, line 1 (link)

    logic: Script lacks executable permissions in git (mode 100644 instead of 100755). Maven uses <chmod> to fix this at runtime (line 369 in pom.xml), but users running the script manually will hit permission errors.

    Run: git update-index --chmod=+x clone-cudf-repo.sh

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 300-305 (link)

    style: The condition creates should.skip.extraction based on use.source.build, but still creates directories (mkdir) even when skipping extraction. These directories are created before the skip check.

    Lines 230-232 create the directories unconditionally:

    <mkdir dir="${project.build.directory}/native-deps"/>
    <mkdir dir="${project.build.directory}/cudf-repo"/>
    <mkdir dir="${project.build.directory}/temp-extract"/>

    Move the mkdir commands inside the conditional block (after line 305) to avoid unnecessary directory creation during source builds.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Add executable to script
Move mkdir to correct place

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 116-132 (link)

    logic: When user explicitly sets USE_PREBUILT_CUDF=ON but jar is missing, the warning is shown but build continues, which will fail later. Consider using FATAL_ERROR instead of WARNING to fail fast and provide clearer feedback.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 84 (link)

    style: potential issue if multiple .so* files have the same name in different subdirectories

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 81 (link)

    logic: Unquoted variable allows path injection. If $JAR_PATH contains spaces or special characters, the unzip command will fail or behave unexpectedly.

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 174-191 (link)

    style: Fallback logic in nested condition updates cache variable within conditional block. This creates state that may not persist correctly across CMake runs. The FORCE update should happen outside the conditional or use a different mechanism.

    Consider restructuring to make the fallback decision earlier and more explicit, or document that users should delete CMakeCache.txt if they encounter issues after a failed prebuilt attempt.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@GaryShen2008
Copy link
Collaborator Author

@greptile full review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 28-31 (link)

    style: hardcoded defaults may become stale over time - consider reading these values from pom.xml or making them required arguments rather than optional environment variables with defaults

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 192 (link)

    style: verify that SNAPSHOT builds use the classifier pattern - using ${cuda.version} as classifier assumes all artifact naming follows this convention

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@GaryShen2008
Copy link
Collaborator Author

@greptileai Can you run a full review once and list up all the problems you have?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll conduct a comprehensive review of this PR that adds support for using prebuilt cuDF libraries from the rapids-4-spark jar to accelerate UDF native builds. Let me analyze all the changes systematically.

Overall Assessment

This is a well-structured PR that addresses a real performance pain point - reducing cuDF build time from 1-2 hours to minutes by reusing prebuilt libraries. The implementation is comprehensive but has several issues that need attention.

Major Issues

1. Copyright Year Inconsistency

Files: clone-cudf-repo.sh, extract-cudf-libs.sh, pom.xml, CMakeLists.txt
Issue: Copyright year shows 2026 instead of 2025

The copyright headers use "2026" which is incorrect for current development.

2. Shell Script Permissions

The new shell scripts (clone-cudf-repo.sh, extract-cudf-libs.sh) may not have execute permissions in the repository, which could cause build failures.

3. Complex Maven Ant Logic

The Maven pom.xml contains very complex Ant scripting (200+ lines) that's hard to maintain and debug. This logic handles:

  • Multiple jar file naming patterns
  • Cross-platform path normalization
  • Complex conditional extraction logic

4. Inconsistent Property Naming

  • USE_PREBUILT_CUDF (CMake/Maven)
  • skipCudfExtraction (Maven only)
  • cudf.git.branch (Maven only)

The naming conventions aren't consistent between build systems.

5. Error Handling Edge Cases

CMakeLists.txt Line 101-149: The auto-fallback logic is complex and has potential race conditions:

# Check if Maven created a marker to force source build
if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/../../../target/USE_SOURCE_BUILD")

This relies on Maven creating a marker file, which creates tight coupling between build tools.

6. Path Assumptions

Multiple hardcoded relative paths assume specific directory structure:

  • ../../../target/native-deps
  • ../../../target/cudf-repo
  • ../../../target/USE_SOURCE_BUILD

Medium Issues

7. Documentation Structure

The README.md becomes quite long with the new options. Consider breaking into subsections or a separate build guide.

8. Maven Repository Configuration

Adding Sonatype snapshots repository without clear justification:

<repository>
    <id>sonatype-snapshots</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    ...
</repository>

9. Version Management

Hardcoded versions in extract script:

RAPIDS4SPARK_VERSION="${RAPIDS4SPARK_VERSION:-26.02.0-SNAPSHOT}"

Should use Maven properties instead.

10. CMake Target Aliasing

add_library(cudf::cudf ALIAS cudf_imported)

Creating aliases for imported targets can be fragile across CMake versions.

Minor Issues

11. Verbose Logging

The extraction process is very verbose with lots of echo statements that could clutter build logs.

12. Windows Compatibility

While path normalization is attempted, the shell scripts are bash-specific and won't work on Windows without WSL/Cygwin.

13. Cleanup Logic

The temporary extraction directory cleanup could be more robust:

<delete dir="${project.build.directory}/temp-extract"/>

14. Variable Naming in Scripts

Some bash variables use inconsistent naming (e.g., CONFLICT_COUNT vs conflict_count).

Positive Aspects

  1. Clear Performance Benefit: Reduces build time by ~2 hours
  2. Backward Compatibility: Maintains ability to build from source
  3. Comprehensive Documentation: Good README updates with examples
  4. Automatic Fallback: Gracefully handles missing dependencies
  5. Configurable: Multiple Maven properties for customization

Recommendations

  1. Fix copyright years to 2025
  2. Simplify Maven logic - consider using Maven plugins instead of complex Ant scripts
  3. Add shell script permissions to git repository
  4. Standardize property naming across build systems
  5. Reduce hardcoded paths - use Maven/CMake variables
  6. Add Windows support or document Linux/macOS requirement
  7. Consider breaking up the large pom.xml changes into smaller, focused commits
  8. Add integration tests to verify the new build path works correctly

Security Considerations

The extraction of libraries from JARs is generally safe, but consider:

  • Validating JAR signatures before extraction
  • Checksums for extracted libraries
  • Clear documentation about trusted sources

Overall Verdict

This is a valuable improvement that solves a real problem, but the implementation complexity introduces maintenance challenges. The core concept is sound, but several implementation details need refinement for production readiness.

@GaryShen2008
Copy link
Collaborator Author

@greptileai It's 2026 now. Why do you think copyright shouldn't use 2026?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right! Since it's 2026 now, the copyright should indeed use 2026. Looking at the diff, I can see that most of the new files correctly use 2026 in their copyright headers:

  • clone-cudf-repo.sh: Copyright (c) 2026, NVIDIA CORPORATION.
  • extract-cudf-libs.sh: Copyright (c) 2026, NVIDIA CORPORATION.
  • pom.xml: Updated from Copyright (c) 2020-2025 to Copyright (c) 2020-2026
  • CMakeLists.txt: Updated from Copyright (c) 2021-2025 to Copyright (c) 2021-2026

This is the correct approach for copyright notices - they should reflect the current year when the code is being modified or created. The files show the proper copyright date range from the original creation year through the current year (2026).

The copyright updates in this PR are appropriate and follow standard practices for maintaining copyright notices in source code.

Handle classifier of plugin jar

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh, line 139 (link)

    logic: extraction will fail silently if jar contains no matching files. The unzip command with -q suppresses warnings when pattern doesn't match. Without the pattern matching anything, the subsequent check on line 173 will fail, but error message won't indicate whether unzip found no matches or if extraction failed for another reason

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 166-170 (link)

    style: checking CUDF_LIBRARY_CHECK but variable might not be defined if find_library fails. find_library doesn't fail the build when library isn't found - it just leaves the variable unset

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Handle no found library from jar

Signed-off-by: Gary Shen <gashen@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 36-40 (link)

    logic: rapids-cmake-dir property may not be set. The get_property() doesn't guarantee the property exists, and the fallback path assumes _deps/rapids-cmake-src which may not exist yet at this point in the build.

  2. examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt, line 251-252 (link)

    logic: Using include() with a path constructed from variables requires the file to exist. If rapids-cmake-dir is incorrect or rapids-cmake isn't downloaded yet, this will fail with a cryptic error.

    Add existence check before include:

  3. examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml, line 391-397 (link)

    logic: Unzip includes symlinks (e.g., libcudf.so.26), but symlinks inside JARs are not preserved when extracted. The pattern **/libcudf.so* will match versioned files like libcudf.so.26.2.0, but the symlink libcudf.solibcudf.so.26 won't work after extraction, causing linker to fail when looking for libcudf.so.

    Check if the jar contains the base libcudf.so or only versioned files, and create symlink if needed. Does the rapids-4-spark jar contain libcudf.so as a regular file or only as a symlink to versioned files?

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Gary Shen <gashen@nvidia.com>
Check path existence before including

Signed-off-by: Gary Shen <gashen@nvidia.com>
@pxLi
Copy link
Member

pxLi commented Jan 6, 2026

cc @YanxuanLiu to help with the following CI updates, thanks

@GaryShen2008 GaryShen2008 merged commit 954225f into NVIDIA:main Jan 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Build]Avoid building cuDF source code in UDF example

2 participants