Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Jan 12, 2026

Summary

This PR modernizes the ArkoudaArray._reduce implementation to better align with
pandas ExtensionArray semantics, and significantly expands unit test coverage for reduction behavior.

Key goals of this change:

  • Make _reduce string-driven, matching pandas’ calling conventions
  • Support a broader, well-defined set of reductions
  • Improve error reporting and dtype validation
  • Add comprehensive tests covering correctness, edge cases, and failure modes

Changes

ArkoudaArray._reduce

  • Replaced ad-hoc conditional logic with a structured dispatch table
  • Added support for:
    • Numeric reductions: sum, prod, min, max, mean, var, std
    • Index reductions: argmin, argmax, first
    • Boolean reductions: any, all, or, and (with dtype enforcement)
    • Count via array size / non-NaN count for floats
    • Mode (array-returning reduction)
  • Improved error handling:
    • Unknown reduction names raise ValueError
    • Unsupported reductions or dtype mismatches raise TypeError
  • Added full type hints and a darglint-compliant docstring

GroupByReductionType

  • Fixed a typo: NUNUNIQUENUNIQUE

Tests

  • Replaced minimal smoke tests with a dedicated TestArkoudaArrayReduce suite
  • Added tests for:
    • Numeric reductions vs NumPy
    • Boolean reductions and dtype validation
    • count, argmin, argmax, and first
    • Empty-array edge cases
    • Unknown reduction names
    • Acceptance of the skipna argument
  • Removed outdated tests that expected "mean" to be unsupported

Rationale

_reduce is part of the pandas ExtensionArray protocol and is invoked directly
by pandas using string operation names.

The expanded test coverage ensures correctness, documents intended semantics,
and provides a stable foundation for adding future reductions (e.g., median,
unique, nunique) as backend support evolves.


Notes for Reviewers

  • This PR does not change public pandas-facing APIs, only internal behavior
  • Reduction semantics intentionally mirror NumPy/pandas where possible
  • Follow-up PRs may add capability-gated reductions as Arkouda support expands

Closes #5308: Improve ArkoudaExtensionArray._reduce

@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from ccea726 to 6bb3eff Compare January 12, 2026 23:29
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch 2 times, most recently from a899478 to 955716b Compare January 13, 2026 14:40
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from 955716b to 38e739c Compare January 13, 2026 18:32
@ajpotts ajpotts marked this pull request as ready for review January 13, 2026 19:29
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@be6b999). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5309   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         4           
  Lines           ?        63           
  Branches        ?         0           
========================================
  Hits            ?        63           
  Misses          ?         0           
  Partials        ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from 38e739c to e630a71 Compare January 16, 2026 15:20
@ajpotts ajpotts marked this pull request as draft January 16, 2026 15:22
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from e630a71 to fa3c74c Compare January 16, 2026 16:19
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from fa3c74c to 91564d8 Compare January 16, 2026 16:20
@ajpotts ajpotts marked this pull request as ready for review January 16, 2026 16:20
Copy link
Collaborator

@1RyanK 1RyanK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

)
def test_reduce_count(self, data, exp_count, exp_nunique):
arr = ArkoudaArray(ak.array(data))
assert arr._reduce("count") == exp_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use exp_nunique in an assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't support nunique yet. I removed it from the test so as not to confuse the developer.

Default is True.
Arkouda-backed arrays currently do not maintain a separate missing-value mask;
behavior for NaN values is determined by the underlying Arkouda operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it actually does skipna=True semantics for float reductions.
I did _reduce("sum") on [1.0, NaN, 2.0] and got NaN, whereas pandas with skipna=True does 3.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a clearer note to the docstring pointing out that this is not fully implemented, but it is required by pandas to be in the signature. I also created a ticket so we remember to revisit this: #5319

"max": data.max,
"mean": data.mean,
"var": data.var,
"std": data.std,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something with the default ddof is going on here.
Example: _reduce("std") on [1,2,3,2] returns 0.7071 (population), but Series.std() returns 0.8165 (sample).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@ajpotts ajpotts added the blocking This is blocking a developer from completing a task they are actively working. label Jan 16, 2026
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from 9c51836 to 4d84262 Compare January 16, 2026 21:29
@ajpotts ajpotts force-pushed the 5308_Improve_ArkoudaExtensionArray._reduce branch from 4d84262 to f74f6c6 Compare January 16, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocking This is blocking a developer from completing a task they are actively working.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve ArkoudaExtensionArray._reduce

3 participants