Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Jan 12, 2026

Summary

This PR introduces a generic Arkouda pandas dtype, dtype="ak", allowing users to construct Arkouda-backed pandas arrays without specifying a concrete Arkouda dtype (e.g. ak_int64, ak_string) up front.

The new generic dtype improves ergonomics and aligns Arkouda’s pandas integration with standard pandas patterns such as dtype="string" or dtype="category".

Motivation

Prior to this change, users had to explicitly specify a concrete Arkouda dtype when constructing pandas objects:

pd.array(data, dtype="ak_int64")
pd.Series(data, dtype="ak_float64")

This is unnecessarily verbose and diverges from typical pandas usage, where users usually select a backend or family and allow the system to infer the concrete dtype.

With this PR, users can now write:

pd.array(data, dtype="ak")
pd.Series(data, dtype="ak")

and rely on Arkouda to infer the appropriate concrete dtype.

What’s in this PR

1. Generic ArkoudaDtype

A new pandas ExtensionDtype, ArkoudaDtype, is introduced and registered under the name "ak".

Key properties:

  • dtype="ak" resolves to ArkoudaDtype
  • construct_array_type() returns ArkoudaExtensionArray
  • Acts as a dispatcher, not a concrete storage dtype

2. Factory-style dispatch in _from_sequence

ArkoudaExtensionArray._from_sequence has been refactored into a true factory:

  • Normalizes pandas-provided dtypes, treating "ak" / ArkoudaDtype as a request for backend inference
  • Converts Python / NumPy inputs to Arkouda objects once
  • Dispatches based on the resulting Arkouda type:
    • pdarrayArkoudaArray
    • StringsArkoudaStringArray
    • pandas-style CategoricalArkoudaCategoricalArray

This makes _from_sequence the single construction choke point used by pandas when dtype="ak" is specified.

3. Updated documentation

The _from_sequence docstring was updated to accurately reflect:

  • Factory/dispatcher behavior
  • Post-conversion dispatch
  • The semantics of dtype="ak" vs concrete Arkouda dtypes
  • pandas construction context (pd.array(..., dtype="ak"))

4. Comprehensive tests

New tests verify that dtype="ak" correctly dispatches for:

  • Numeric data (int64, float64)
  • Strings
  • Arkouda pandas Categorical
  • Both pd.array and pd.Series construction paths

Tests also document the required construction pattern for categoricals (pd.array(..., dtype="ak") prior to Series) to avoid pandas eager iteration.

Non-goals / Follow-ups

  • astype("ak") behavior is intentionally out of scope for this PR
  • No changes to existing concrete Arkouda dtype strings
  • No changes to pandas materialization or iteration semantics

These can be addressed in follow-up PRs if desired.

Impact

  • Significantly improves usability of Arkouda’s pandas ExtensionArray API
  • Reduces boilerplate for exploratory and backend-agnostic code
  • Provides a clean foundation for future dtype-related enhancements

Example

import pandas as pd

pd.array([1, 2, 3], dtype="ak")        # ArkoudaArray (int64)
pd.array(["a", "b"], dtype="ak")       # ArkoudaStringArray
pd.Series([1.0, 2.0], dtype="ak")      # ArkoudaArray (float64)

Closes #5303: Pandas ExtensionArray: allow dtype=ak for generic Arkouda-backed arrays

@ajpotts ajpotts force-pushed the 5303_Pandas_ExtensionArray_allow_dtype=ak branch 3 times, most recently from 76f3b24 to cfc9e7e Compare January 12, 2026 20:56
@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@be6b999). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5304   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         4           
  Lines           ?        63           
  Branches        ?         0           
========================================
  Hits            ?        63           
  Misses          ?         0           
  Partials        ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajpotts ajpotts marked this pull request as ready for review January 12, 2026 21:58
@ajpotts ajpotts force-pushed the 5303_Pandas_ExtensionArray_allow_dtype=ak branch from cfc9e7e to 11fcad3 Compare January 16, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pandas ExtensionArray: allow dtype="ak" for generic Arkouda-backed arrays

2 participants