-
Notifications
You must be signed in to change notification settings - Fork 97
Closes #5308: Improve ArkoudaExtensionArray._reduce #5309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Closes #5308: Improve ArkoudaExtensionArray._reduce #5309
Conversation
ccea726 to
6bb3eff
Compare
a899478 to
955716b
Compare
955716b to
38e739c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5309 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 4
Lines ? 63
Branches ? 0
========================================
Hits ? 63
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
38e739c to
e630a71
Compare
e630a71 to
fa3c74c
Compare
fa3c74c to
91564d8
Compare
1RyanK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| ) | ||
| def test_reduce_count(self, data, exp_count, exp_nunique): | ||
| arr = ArkoudaArray(ak.array(data)) | ||
| assert arr._reduce("count") == exp_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use exp_nunique in an assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't support nunique yet. I removed it from the test so as not to confuse the developer.
| Default is True. | ||
| Arkouda-backed arrays currently do not maintain a separate missing-value mask; | ||
| behavior for NaN values is determined by the underlying Arkouda operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it actually does skipna=True semantics for float reductions.
I did _reduce("sum") on [1.0, NaN, 2.0] and got NaN, whereas pandas with skipna=True does 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a clearer note to the docstring pointing out that this is not fully implemented, but it is required by pandas to be in the signature. I also created a ticket so we remember to revisit this: #5319
| "max": data.max, | ||
| "mean": data.mean, | ||
| "var": data.var, | ||
| "std": data.std, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something with the default ddof is going on here.
Example: _reduce("std") on [1,2,3,2] returns 0.7071 (population), but Series.std() returns 0.8165 (sample).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
9c51836 to
4d84262
Compare
4d84262 to
f74f6c6
Compare
Summary
This PR modernizes the
ArkoudaArray._reduceimplementation to better align withpandas ExtensionArray semantics, and significantly expands unit test coverage for reduction behavior.
Key goals of this change:
_reducestring-driven, matching pandas’ calling conventionsChanges
ArkoudaArray._reduce
sum,prod,min,max,mean,var,stdargmin,argmax,firstany,all,or,and(with dtype enforcement)ValueErrorTypeErrorGroupByReductionType
NUNUNIQUE→NUNIQUETests
TestArkoudaArrayReducesuitecount,argmin,argmax, andfirstskipnaargument"mean"to be unsupportedRationale
_reduceis part of the pandas ExtensionArray protocol and is invoked directlyby pandas using string operation names.
The expanded test coverage ensures correctness, documents intended semantics,
and provides a stable foundation for adding future reductions (e.g.,
median,unique,nunique) as backend support evolves.Notes for Reviewers
Closes #5308: Improve ArkoudaExtensionArray._reduce