Skip to content

Conversation

@SoulSniper1212
Copy link

Overview

This pull request fixes an issue (#63078) where pandas DataFrames with datetime64[ns] in MultiIndex would fail when processed using joblib or other multiprocessing libraries. The error occurred in the NDArrayBacked.__setstate__ method during unpickling, where unexpected state formats from multiprocessing contexts would trigger a NotImplementedError.

Checklist

  • Code changes: Modified pandas/_libs/arrays.pyx to handle additional state formats in NDArrayBacked.__setstate__
  • Tests: Added comprehensive tests in test_fix.py to verify the fix works for the reported scenario
  • Documentation: Not required as this is a bug fix that maintains existing functionality

Proof

The fix addresses the issue by adding handling for:

  1. 2-element states that may have different tuple structures in multiprocessing
  2. 3-element states where the third element is a (dtype, array) tuple instead of an attributes dict
  3. Other unexpected state formats that previously raised NotImplementedError

The test script test_fix.py demonstrates that the fix resolves the issue by:

  1. Testing the specific problematic state format directly
  2. Reproducing the original scenario with datetime64[ns] MultiIndex
  3. Confirming that pickle/unpickle operations work correctly after the fix

The changes maintain backward compatibility while adding robustness to handle multiprocessing-related pickling variations.

Closes #63078

…e64[ns] MultiIndex

Signed-off-by: SoulSniper1212 <warush23@gmail.com>
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding a lot of cases and it is not clear to me why pandas must handle those. In addition, please adhere to the pandas development standards: https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html

Comment on lines +109 to +110
# Handle case where (array, dtype) is passed instead of (data, dtype)
dtype, data = data, dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to understand more about what the conditions are and why this case occurs.

Comment on lines +127 to +128
# This can occur when pickle/unpickle happens in multiprocessing contexts like joblib
# where additional pickling/unpickling steps might create unexpected state formats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. Why are unexpected state formats occurring?

@jbrockmendel jbrockmendel added the AI Slop Suspected of being AI-generated, which is not welcome. label Nov 16, 2025
@jbrockmendel
Copy link
Member

Did a human write this?

@mroeschke
Copy link
Member

Based on the contributor's history, I believe this is an AI generated PR so closing.

@mroeschke mroeschke closed this Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Slop Suspected of being AI-generated, which is not welcome.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: when np.datetime64[ns] is a type in a MultiIndex, "NotImplementedError" when trying to return the df from a joblib.delayed

4 participants