Fix pivot_table corruption with large datasets in Python 3.14 #63316
+75
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #63314
Description
This PR fixes a critical bug where
pivot_table()produces corrupted output with duplicate index values when processing large datasets under Python 3.14.Problem
When pivoting ~100,000 rows in Python 3.14, the result contained only ~33,334 unique index values instead of 100,000, with duplicate index entries.
Root Cause
The
compress_group_indexfunction inpandas/core/sorting.pywas usingInt64HashTable.get_labels_groupby()which produces incorrect results in Python 3.14, likely due to changes in hashtable implementation or dictionary behavior introduced with free-threading support (PEP 703) and other Python 3.14 improvements.Solution
Modified
compress_group_indexto:Changes
compress_group_index()function to handle Python 3.14+test_pivot_table_large_dataset_no_duplicates()Testing
Added
test_pivot_table_large_dataset_no_duplicates()which:The fix has been tested to ensure backward compatibility with Python <3.14.
Checklist