Make hash by the value allocated, not by pointer #22129

cpjulia · 2025-11-25T04:13:23Z

Scope & Purpose

In AqlValue.cpp, there's a hashing function and a comparator. An AqlValue object can have different types of values, inline, slice pointers, managed slices and managed strings (and, in the future, supervised slice). Inline values are non-allocated values that are distributed within the AqlValue's 16 bytes of available space, slice pointers are pointers to values not owned by the AqlValue object, and managed slices and strings are dynamically allocated payloads owned by the AqlValue object. Even if multiple aql values point to the same managed slice, still, it's owned by AqlValue, and the last AqlValue that points to it to be destroy has to free this allocated memory. The hashing and comparator are used in AqlItemBlock for deduplication, when inserting the AqlValue on a block, so it has to compare if two aql values are equal.

Now onto the hashing, before, the hash was made with the pointers, not the data pointed to by the pointers. The data that the pointer stores is the address of the payload it points to, hence, if two aql values that own memory were compared using this approach, they would never be considered equal, and they could be, because, even though having memory allocated in different addresses, the payload that each store could be semantically the same. This would lead to the comparison used in AqlItemBlock not working properly, and wrongfully distinguishing values that should be considered the same in practice.
The new approach is to hash by the value stored in the address in cases where there's dynamic allocation.
When the value is inline, it's safe to maintain the original approach.
When the value is of type slice pointer, it means AqlValue doesn't own it, hence, the pointers should be hashed. If they point to the same address, then they could be considered the same.

💩 Bugfix
🍕 New feature
🔥 Performance improvement
🔨 Refactoring/simplification

Checklist

Related Information

(Please reference tickets / specification / other PRs etc)

Docs PR:
Enterprise PR:
GitHub issue / Jira ticket:
Design document:

Note

Switches AqlValue hash/equality to normalized content-based semantics (incl. ranges) and updates AqlItemBlock cloning/serialization accordingly, adding comprehensive tests.

AQL Core:
- AqlValue:
  - Implement normalized, content-based hash() (incl. RANGE), add kDefaultSeed, and remap zero hash.
  - Rework std::hash<AqlValue> and std::equal_to<AqlValue> to use VelocyPack normalized comparison across storage types; compare inlines by value; RANGE by bounds.
  - Minor safety asserts and RANGE compare cleanup.
AqlItemBlock:
- toVelocyPack(...): uses FlatHashMap<AqlValue, size_t> with value-based keys; minor cleanup.
- cloneDataAndMoveShadow(...): simplify shadow-row handling to set values and transfer ownership; cache by pointer only for cloning optimization.
Tests:
- Add AqlValueHashAlgorithmCorrectnessTest, AqlValueHashEqualTest, and AqlValueHashTest validating content-based hashing/equality and block (de)serialization behavior.
- Register new tests in tests/CMakeLists.txt.

^{Written by Cursor Bugbot for commit d2def76. This will update automatically on new commits. Configure here.}

mchacki

I have some clarifying questions

mchacki · 2025-11-25T11:02:48Z

arangod/Aql/AqlValue.cpp

-      return std::hash<void const*>()(x._data.rangeMeta.range);
+  using T = AqlValue::AqlValueType;
+  auto aqlValueType = x.type();
+  // as this is non owning, we hash by the pointer


What does the owning make for a difference?

Hashing is a quick look of the content is equal.
In my opinion it should not take into account if the value is owned or not.

mchacki · 2025-11-25T11:06:56Z

arangod/Aql/AqlValue.cpp

+  if (h == 0) {  // fallback to avoid collision with the marker that uses h ==
+                 // 0, very unlikely to happen
+    h = 1;
+  }


Hö what is this?

mchacki · 2025-11-25T11:09:04Z

arangod/Aql/AqlValue.cpp

+        return a._data.slicePointerMeta.pointer ==
+               b._data.slicePointerMeta.pointer;


Why is this pointer comparison?
And not real equality comparison like the default case?

…-value-hash

… semantically equal values

…-value-hash

… algorithm

…-value-hash

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

tests/Aql/AqlValueHashTest.cpp

…-value-hash

arangod/Aql/AqlValue.cpp

…-value-hash

cursor · 2025-12-02T08:55:23Z

arangod/Aql/AqlItemBlock.cpp

+            cache.emplace(a.data());
+            res->setValue(row, col, a);
+            // Transfer ownership to res - guard won't destroy it
+            guard.steal();


Bug: Shadow row deduplication removed while cache remains populated

The shadow row handling now calls cache.emplace(a.data()) but ignores the return value and always transfers ownership via guard.steal(). The old code used the emplace result to deduplicate: when a pointer was already cached, it would destroy the duplicate value and reuse the cached one. The new code removes this deduplication but still populates the cache, suggesting incomplete refactoring. If multiple shadow row cells share the same underlying data pointer, this could lead to multiple AqlValue objects referencing the same memory, potentially causing memory management issues.

…-value-hash

cursor · 2025-12-03T15:35:53Z

arangod/Aql/AqlValue.cpp

+      case T::VPACK_INLINE_DOUBLE:
+        // long numbers are stored in the same form, so we can compare raw bits
+        return a._data.longNumberMeta.data.intLittleEndian.val ==
+               b._data.longNumberMeta.data.intLittleEndian.val;


Bug: Inline double equality uses raw bits incorrectly

The equal_to<AqlValue> comparison for VPACK_INLINE_INT64, VPACK_INLINE_UINT64, and VPACK_INLINE_DOUBLE uses raw bit comparison (intLittleEndian.val == intLittleEndian.val). This is problematic for doubles because -0.0 and +0.0 have different bit patterns but are semantically equal in IEEE 754 (-0.0 == +0.0 is true). The hash function uses normalizedHash() which likely normalizes these values to the same hash. This could violate the hash/equality contract: two values might hash to the same bucket but compare unequal, causing incorrect behavior in hash maps/sets used for deduplication.

arangod/Aql/AqlValue.cpp

…-value-hash

Make hash by the value allocated, not by pointer

446b3bb

cla-bot bot added the cla-signed label Nov 25, 2025

cpjulia added this to the devel milestone Nov 25, 2025

cpjulia added 2 commits November 25, 2025 01:37

Removed duplicate

1977cd9

Fix hash function

e0dafb5

mchacki reviewed Nov 25, 2025

View reviewed changes

cpjulia added 15 commits November 25, 2025 08:56

Removed special treatment for SLICE_POINTER in hash

4572970

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

b8ef3df

…-value-hash

Added missing brace

28eafae

Added comparison for different types

ad56c03

Change comparison to use normalized equal instead of binaryEquals for…

f3aa424

… semantically equal values

Remove unused variable

0eb6df5

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

9c201b3

…-value-hash

Fix hash comparison

6f766d1

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

36eccf0

…-value-hash

Fix comment

12f1c17

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

b44b774

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

fb50172

…-value-hash

Simplified AqlValue equal_to, added new tests

2650bd8

Added test to check if the new implementation makes more sense in the…

4bce568

… algorithm

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

66e216b

…-value-hash

cursor bot reviewed Nov 28, 2025

View reviewed changes

tests/Aql/AqlValueHashTest.cpp Outdated Show resolved Hide resolved

tests/Aql/AqlValueHashTest.cpp Show resolved Hide resolved

tests/Aql/AqlValueHashTest.cpp Show resolved Hide resolved

cpjulia added 7 commits November 28, 2025 13:06

Fixed usage of velocypack helper in AqlValue equal_to and added tests

8a33cdd

Added new tests

8d4f533

Pass seed to avoid collisions and fix tests

20708d0

Added nullptr check to hash

3892a23

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

d2165d1

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

e6e3225

…-value-hash

Attempt to fix circle ci errors

d631a69

cursor bot reviewed Dec 2, 2025

View reviewed changes

arangod/Aql/AqlValue.cpp Outdated Show resolved Hide resolved

cpjulia added 2 commits December 2, 2025 05:14

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

d4d92e3

…-value-hash

Simplified logic in cloneDataAndMoveShadow

9dcf5f1

cursor bot reviewed Dec 2, 2025

View reviewed changes

cpjulia added 3 commits December 2, 2025 10:49

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

2e7a7ba

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

4c83b53

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

2f72dcc

…-value-hash

cursor bot reviewed Dec 3, 2025

View reviewed changes

cpjulia added 5 commits December 4, 2025 12:12

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

51699d9

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

2455622

…-value-hash

Simplified hash ccode, removed verbose comments

122e129

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

2f5a6ba

…-value-hash

Merge branch 'devel' of github.com:arangodb/arangodb into bug-fix/aql…

d2def76

…-value-hash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make hash by the value allocated, not by pointer #22129

Make hash by the value allocated, not by pointer #22129

cpjulia commented Nov 25, 2025 •

edited by cursor bot

Loading

Uh oh!

mchacki left a comment

Uh oh!

mchacki Nov 25, 2025

Uh oh!

mchacki Nov 25, 2025

Uh oh!

mchacki Nov 25, 2025

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 2, 2025

Uh oh!

cursor bot Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return a._data.slicePointerMeta.pointer ==
		b._data.slicePointerMeta.pointer;

Make hash by the value allocated, not by pointer #22129

Are you sure you want to change the base?

Make hash by the value allocated, not by pointer #22129

Conversation

cpjulia commented Nov 25, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope & Purpose

Checklist

Related Information

Uh oh!

mchacki left a comment

Choose a reason for hiding this comment

Uh oh!

mchacki Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mchacki Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mchacki Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 2, 2025

Choose a reason for hiding this comment

Bug: Shadow row deduplication removed while cache remains populated

Uh oh!

cursor bot Dec 3, 2025

Choose a reason for hiding this comment

Bug: Inline double equality uses raw bits incorrectly

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cpjulia commented Nov 25, 2025 •

edited by cursor bot

Loading