Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays #5304
+149
−36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a generic Arkouda pandas dtype,
dtype="ak", allowing users to construct Arkouda-backed pandas arrays without specifying a concrete Arkouda dtype (e.g.ak_int64,ak_string) up front.The new generic dtype improves ergonomics and aligns Arkouda’s pandas integration with standard pandas patterns such as
dtype="string"ordtype="category".Motivation
Prior to this change, users had to explicitly specify a concrete Arkouda dtype when constructing pandas objects:
This is unnecessarily verbose and diverges from typical pandas usage, where users usually select a backend or family and allow the system to infer the concrete dtype.
With this PR, users can now write:
and rely on Arkouda to infer the appropriate concrete dtype.
What’s in this PR
1. Generic
ArkoudaDtypeA new pandas
ExtensionDtype,ArkoudaDtype, is introduced and registered under the name"ak".Key properties:
dtype="ak"resolves toArkoudaDtypeconstruct_array_type()returnsArkoudaExtensionArray2. Factory-style dispatch in
_from_sequenceArkoudaExtensionArray._from_sequencehas been refactored into a true factory:"ak"/ArkoudaDtypeas a request for backend inferencepdarray→ArkoudaArrayStrings→ArkoudaStringArrayCategorical→ArkoudaCategoricalArrayThis makes
_from_sequencethe single construction choke point used by pandas whendtype="ak"is specified.3. Updated documentation
The
_from_sequencedocstring was updated to accurately reflect:dtype="ak"vs concrete Arkouda dtypespd.array(..., dtype="ak"))4. Comprehensive tests
New tests verify that
dtype="ak"correctly dispatches for:int64,float64)Categoricalpd.arrayandpd.Seriesconstruction pathsTests also document the required construction pattern for categoricals (
pd.array(..., dtype="ak")prior toSeries) to avoid pandas eager iteration.Non-goals / Follow-ups
astype("ak")behavior is intentionally out of scope for this PRThese can be addressed in follow-up PRs if desired.
Impact
Example
Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays