Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Dec 24, 2025

Improve astype semantics across Arkouda pandas ExtensionArrays

Summary

This PR implements fully-featured, pandas-compatible astype behavior for all Arkouda-backed pandas ExtensionArrays:

  • ArkoudaArray
  • ArkoudaCategoricalArray
  • ArkoudaStringArray

The implementation aligns with pandas’ ExtensionArray.astype contract, avoids unnecessary NumPy fallbacks, and consistently returns Arkouda-backed ExtensionArrays whenever possible.

In addition, this PR:

  • Adds comprehensive type hints and overloads to satisfy mypy
  • Expands string dtype normalization ("string" support)
  • Introduces extensive unit tests covering numeric, string, categorical, object, and ExtensionDtype casting
  • Makes doctests resilient to platform-dependent string-width differences

Key Changes

Correct, consistent astype behavior

  • objectalways returns NumPy
  • Same dtype + copy=Falsereturns self
  • Numeric ↔ numeric → server-side Arkouda cast
  • Categorical
    • category → stays categorical
    • stringArkoudaStringArray
    • other dtypes → labels cast via Arkouda, returned as ExtensionArray
  • Strings
    • string targets → stay ArkoudaStringArray
    • numeric/bool targets → server-side cast, return ExtensionArray
    • invalid numeric parses raise RuntimeError (documented and tested)

Pandas compatibility & typing

  • Adds explicit @overload signatures matching pandas ExtensionArray.astype
  • Fixes mypy override and return-type errors

Tests

New test coverage includes:

  • Numeric → numeric casts
  • ExtensionDtype targets (pd.Int64Dtype, pd.StringDtype, etc.)
  • Categorical → string / numeric
  • String → numeric / object
  • Invalid string-to-numeric casts
  • Copy vs no-copy semantics
  • Doctest validation using ellipsis to avoid brittle unicode-width assertions

Why this matters

  • Enables Arkouda ExtensionArrays to behave predictably inside pandas pipelines
  • Avoids silent NumPy fallbacks that break distributed execution
  • Clarifies and documents Arkouda’s server-side casting semantics
  • Unblocks downstream pandas operations that rely on astype

Closes #5219: improve astype in ak.pandas.extension

@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from f7d1dfc to 1935bed Compare December 24, 2025 22:12
@ajpotts ajpotts marked this pull request as ready for review December 29, 2025 18:04
@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from 1935bed to f378f3d Compare January 16, 2026 18:36
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@03aa4df). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5222   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         4           
  Lines           ?        63           
  Branches        ?         0           
========================================
  Hits            ?        63           
  Misses          ?         0           
  Partials        ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from f378f3d to f023951 Compare January 16, 2026 19:15
@ajpotts ajpotts added the blocking This is blocking a developer from completing a task they are actively working. label Jan 16, 2026
@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from f023951 to 8731d2d Compare January 16, 2026 21:22
@ajpotts ajpotts force-pushed the 5219_improve_astype_in_ak.pandas.extension branch from 8731d2d to 28111d6 Compare January 16, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocking This is blocking a developer from completing a task they are actively working.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve astype in ak.pandas.extension

1 participant