Skip to content

Conversation

@TheReaperJay
Copy link

Summary

This PR upgrades MAGE's cuGraph integration from the legacy RAPIDS 22.02 / CUDA 11.5 stack to modern RAPIDS 25.12 / CUDA 13.1, bringing GPU-accelerated graph algorithms up to date with current NVIDIA tooling.

Key Changes

Infrastructure Upgrade:

  • CUDA 11.5.2 → 13.1.0
  • RAPIDS/cuGraph 22.02 → 25.12
  • Ubuntu 20.04 → 24.04
  • Python 3.8 → 3.12

API Migration:
All 9 cuGraph algorithms updated to use the modern pylibcugraph API:

  • cugraph::pagerankcugraph::pagerank() with explicit graph view
  • cugraph::betweenness_centrality → normalized output handling
  • cugraph::hits → proper hub/authority vector management
  • cugraph::katz_centrality → updated alpha/beta parameter handling
  • cugraph::louvain / cugraph::leiden → new clustering return types
  • cugraph::personalized_pagerank → vertex list handling

Legacy API Preserved:
Two algorithms remain on cugraph::ext_raft:: API as they haven't been migrated in RAPIDS 25.x:

  • balanced_cut_clustering
  • spectral_clustering

E2E Tests Added

Comprehensive end-to-end tests for all 9 algorithms following MAGE's existing test framework:

e2e/pagerank_test/test_cugraph_networkx_validation/
e2e/betweenness_centrality_test/test_cugraph_networkx_validation/
e2e/hits_test/test_cugraph_networkx_validation/
e2e/katz_test/test_cugraph_networkx_validation/
e2e/louvain_test/test_cugraph_networkx_validation/
e2e/leiden_cugraph_test/test_cugraph_networkx_validation/
e2e/personalized_pagerank_test/test_cugraph_networkx_validation/
e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/
e2e/spectral_clustering_test/test_cugraph_networkx_validation/

Each test uses a 9-node two-community graph topology with expected values validated against NetworkX ground truth (5% tolerance for GPU floating-point variance).

Validation Script

Added scripts/validate_cugraph_algorithms.py - a standalone debugging tool that:

  1. Builds identical graph in NetworkX (ground truth)
  2. Spins up Memgraph container with cuGraph modules
  3. Runs each algorithm and compares against NetworkX
  4. Reports pass/fail with detailed value comparisons

This is for developer debugging, not CI.

Test Plan

  • All 9 cuGraph algorithms pass validation against NetworkX ground truth
  • Docker image builds successfully with Dockerfile.cugraph
  • E2E tests follow existing MAGE test conventions
  • CI pipeline runs (pending merge)

Breaking Changes

None. All algorithm signatures and return types preserved.

…atibility

Upgrades the cuGraph module from RAPIDS 22.02/CUDA 11.5 to RAPIDS 25.12/CUDA 13.1,
bringing 3 years of performance improvements and modern GPU support.

## Motivation

The current implementation uses:
- CUDA 11.5.2 (EOL, no RTX 40xx/50xx or H100 support)
- cuGraph 22.02 (deprecated APIs)
- Ubuntu 20.04 (EOL since April 2025)
- Python 3.8 (EOL since October 2024)

## Changes

**Modern API (8 algorithms):**
- pagerank, betweenness_centrality, hits, katz_centrality
- louvain, leiden, personalized_pagerank, graph_generator

Uses `cugraph::create_graph_from_edgelist` with edge property views.
Returns allocated results via structured bindings.

**Legacy API (2 algorithms):**
- balanced_cut_clustering, spectral_clustering

These use `cugraph::ext_raft::` namespace which only supports legacy
`GraphCSRView`. No modern API equivalent exists in cuGraph 25.x.
Added required `raft::random::RngState` parameter for 25.x compatibility.

**Key implementation notes:**
- renumber=false: GraphView provides 0-based contiguous indices
- Edge properties use variant type (arithmetic_device_uvector_t)
- Build requires -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE

## Validation

All 9 algorithms validated against NetworkX ground truth:
- PageRank, Betweenness, HITS, Katz: exact/near-exact match
- Louvain, Leiden: correct community detection
- Balanced Cut, Spectral: correct clustering

## Hardware Support Added

- NVIDIA RTX 40xx (Ada Lovelace)
- NVIDIA RTX 50xx (Blackwell)
- NVIDIA H100/H200 (Hopper)
…kX validation

This commit introduces comprehensive end-to-end tests for all cuGraph GPU-accelerated
graph algorithms, integrated into MAGE's existing e2e testing framework.

## What Was Added

### E2E Tests (e2e/**/test_cugraph_networkx_validation/)

Each algorithm now has a dedicated test case following MAGE's e2e conventions:

  e2e/pagerank_test/test_cugraph_networkx_validation/
  e2e/betweenness_centrality_test/test_cugraph_networkx_validation/
  e2e/hits_test/test_cugraph_networkx_validation/
  e2e/katz_test/test_cugraph_networkx_validation/
  e2e/louvain_test/test_cugraph_networkx_validation/
  e2e/leiden_cugraph_test/test_cugraph_networkx_validation/
  e2e/personalized_pagerank_test/test_cugraph_networkx_validation/
  e2e/balanced_cut_clustering_test/test_cugraph_networkx_validation/  (new)
  e2e/spectral_clustering_test/test_cugraph_networkx_validation/      (new)

Each test directory contains:
- input.cyp: A 9-node test graph with two communities (A1-A4, B1-B4) connected
  via a HUB node, providing a consistent topology for validating algorithm behavior
- test.yml: Expected results with pytest.approx tolerances (rel=0.05, abs=1e-6)

### Standalone Validation Script (scripts/validate_cugraph_algorithms.py)

A debugging and validation tool that:
1. Builds the identical 9-node graph in NetworkX (ground truth)
2. Computes expected values using NetworkX's reference implementations
3. Spins up a Memgraph container with cuGraph modules
4. Runs each cuGraph algorithm and compares against NetworkX
5. Reports pass/fail with detailed value comparisons

This script is NOT part of the CI pipeline - it exists for developers to:
- Validate cuGraph results against known-correct NetworkX implementations
- Debug algorithm discrepancies during development
- Verify GPU acceleration produces mathematically equivalent results

## Why This Approach

1. **E2E Framework Integration**: Tests use MAGE's existing pytest-based e2e
   infrastructure, ensuring they run alongside other module tests in CI.

2. **NetworkX as Ground Truth**: NetworkX is the de-facto standard for graph
   algorithms in Python. Validating cuGraph against NetworkX proves mathematical
   correctness, not just "it runs without crashing."

3. **Tolerance-Based Comparison**: GPU floating-point operations may produce
   slightly different results than CPU. Using pytest.approx with 5% relative
   tolerance accounts for this while still catching algorithmic errors.

4. **Consistent Test Graph**: The 9-node two-community topology was chosen because:
   - Small enough for fast execution
   - Complex enough to exercise algorithm behavior (communities, hub node)
   - Produces deterministic, verifiable results

## Algorithms Tested

Centrality Measures:
- cugraph.pagerank
- cugraph.betweenness_centrality
- cugraph.hits
- cugraph.katz_centrality
- cugraph.personalized_pagerank

Community Detection:
- cugraph.louvain
- cugraph.leiden

Clustering (Legacy ext_raft API):
- cugraph.balanced_cut_clustering
- cugraph.spectral_clustering

Note: balanced_cut and spectral use the legacy cugraph::ext_raft:: API as these
algorithms have not been migrated to the new pylibcugraph API in RAPIDS 25.x.
@CLAassistant
Copy link

CLAassistant commented Dec 28, 2025

CLA assistant check
All committers have signed the CLA.

@TheReaperJay TheReaperJay mentioned this pull request Dec 28, 2025
55 tasks
- Upgrade PyTorch to cu130 (CUDA 13.0 support via pytorch.org/whl/cu130)
- Upgrade DGL to torch-2.9/cu130 wheels (removes torchdata dependency)
- Add torch_geometric with PyG extensions built from source for CUDA 13
- Add unixodbc-dev for pyodbc module support
- Upgrade numpy and gensim for binary compatibility

These changes ensure all Python ML modules load without errors on CUDA 13.1,
fixing issues with nvToolsExt, torchdata.datapipes, and torch_geometric imports.
The cuGraph C++ library supports sampling via the 'vertices' parameter,
which limits betweenness computation to k random source vertices instead
of all V vertices. This reduces complexity from O(V*E) to O(k*E).

The MAGE wrapper did not expose this parameter - it always passed
std::nullopt (use all vertices). This change adds the k parameter.

Note: Other cuGraph parameters (initial_pageranks, precomputed caches,
warm-start hints) are intentionally not exposed because MAGE procedures
are stateless - there is no way to persist or pass state between calls.
The k parameter is different: it is not about state, it is about avoiding
memory explosion on large graphs by sampling source vertices.

Backward compatible: default k=0 preserves existing behavior (all vertices).

Changes:
- Add optional 'k' parameter (default=0 means use all vertices)
- When k>0 and k<V: randomly sample k vertices as sources
- Pass sampled vertices to cuGraph via device_span

Usage: CALL cugraph.betweenness_centrality.get(true, true, 1000)
       (normalized=true, directed=true, k=1000)
…y management

Problem:
Betweenness centrality and other memory-intensive algorithms on large graphs
were failing with CUDA out-of-memory errors even when sufficient VRAM was
available.

Root Cause:
RMM (RAPIDS Memory Manager) was using the default device allocator which
allocates memory on-demand without pooling. This caused memory fragmentation
across PageRank, Louvain, and other algorithms. When subsequent algorithms
attempted to allocate large contiguous blocks, CUDA could not find one
despite having enough total free memory.

Solution:
Initialize CUDA's built-in async memory resource (cudaMallocAsync) as the
default RMM device resource at module load time. This provides:

1. Automatic memory pooling managed by CUDA driver
2. Defragmentation handled transparently by the driver
3. Contiguous memory blocks available for large allocations
4. No manual pool size configuration required
5. Optimal memory reuse across algorithm invocations

The static initializer in mg_cugraph_utility.hpp runs once when each cuGraph
module is loaded, before any algorithm execution. All existing code that calls
rmm::mr::get_current_device_resource() automatically uses the pooled allocator
with zero code changes to individual algorithms.

This is part of the RAPIDS 25.x / CUDA 13 upgrade (PR memgraph#710).
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants