Skip to content

Commit 3f60750

Browse files
committed
TST/ENH: raise TypeError in Series.searchsorted for incomparable object-dtype values; add test
1 parent 944c527 commit 3f60750

File tree

3 files changed

+98
-0
lines changed

3 files changed

+98
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
Title: TST/ENH: Raise TypeError in Series.searchsorted for incomparable object-dtype values
2+
3+
Summary
4+
-------
5+
This small change makes Series.searchsorted raise a TypeError when the underlying
6+
values are a numpy ndarray with dtype=object containing elements that are not
7+
mutually comparable with the provided search value (for example, mixing int and
8+
str). This aligns the behavior of `searchsorted` with `sort_values` and
9+
reduces surprising cases where NumPy's `searchsorted` can return an index even
10+
though comparisons between the types would fail.
11+
12+
Files changed
13+
------------
14+
- pandas/core/base.py
15+
- Add a lightweight runtime comparability check for object-dtype ndarrays in
16+
IndexOpsMixin.searchsorted. If a simple sample comparison between an array
17+
element and the search value raises TypeError, we propagate that TypeError.
18+
19+
- pandas/tests/series/methods/test_searchsorted.py
20+
- Add `test_searchsorted_incomparable_object_raises` which asserts that
21+
`Series([1, 2, "1"]).searchsorted("1")` raises TypeError.
22+
23+
Rationale
24+
--------
25+
Pandas delegates `searchsorted` to NumPy for ndarray-backed data. NumPy's
26+
behavior on mixed-type object arrays can be surprising: it sometimes finds an
27+
insertion index even when Python comparisons between element types would raise
28+
TypeError (e.g. `1 < "1"`). Other pandas operations (like `sort_values`) raise
29+
in that situation, so this change makes `searchsorted` consistent with the
30+
rest of pandas.
31+
32+
Behavior and trade-offs
33+
----------------------
34+
- The comparability check is deliberately lightweight: it attempts a single
35+
comparison between the first non-NA array element and the sample search
36+
value. If that raises TypeError, we re-raise.
37+
- This heuristic catches the common case (mixed ints/strings) without scanning
38+
the whole array (which would be expensive). It may not detect all
39+
pathological mixed-type arrays (for example, if the first element is
40+
comparable but later ones are not). If we want a stricter rule we can
41+
instead sample more elements or check types across the array, at some
42+
performance cost.
43+
44+
Testing
45+
------
46+
- New test added (see above). To run locally:
47+
48+
# install in editable mode if importing from source
49+
python -m pip install -ve .
50+
51+
# run the single test
52+
pytest -q pandas/tests/series/methods/test_searchsorted.py::test_searchsorted_incomparable_object_raises
53+
54+
Compatibility
55+
------------
56+
- Backwards compatible for numeric/datetime/etc. arrays: behavior unchanged.
57+
- For object-dtype arrays with mixed types there is now a TypeError where
58+
previously NumPy might have silently returned an index. This is intentional
59+
to make behavior consistent with sorting.
60+
61+
Follow-ups
62+
---------
63+
- If desired, we can strengthen the comparability check (sample multiple
64+
elements or inspect the set of Python types) and add tests for those
65+
conditions.
66+
67+
PR checklist
68+
-----------
69+
- [ ] Add release note if desired (small change to searchsorted semantics)
70+
- [ ] Add/adjust tests for stronger heuristics if implemented

pandas/core/algorithms.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
is_integer_dtype,
6161
is_list_like,
6262
is_object_dtype,
63+
is_scalar,
6364
is_signed_integer_dtype,
6465
needs_i8_conversion,
6566
)
@@ -1326,6 +1327,27 @@ def searchsorted(
13261327

13271328
# Argument 1 to "searchsorted" of "ndarray" has incompatible type
13281329
# "Union[NumpyValueArrayLike, ExtensionArray]"; expected "NumpyValueArrayLike"
1330+
# If `arr` is an object-dtype ndarray that mixes incomparable Python
1331+
# types (e.g. ints and strs), NumPy may still return an insertion
1332+
# index while direct Python comparisons raise TypeError. To make
1333+
# behavior consistent with pandas operations that rely on comparisons
1334+
# (e.g. sort_values), attempt a lightweight comparability check and
1335+
# propagate a TypeError if comparisons fail.
1336+
if isinstance(arr, np.ndarray) and arr.dtype == object:
1337+
# find first non-NA element to use as a sample
1338+
try:
1339+
first = next(x for x in arr if not isna(x))
1340+
except StopIteration:
1341+
first = None
1342+
1343+
if first is not None:
1344+
sample = value[0] if is_list_like(value) and not is_scalar(value) else value
1345+
try:
1346+
_ = first < sample
1347+
except TypeError:
1348+
# bubble up the TypeError (message comes from Python)
1349+
raise
1350+
13291351
return arr.searchsorted(value, side=side, sorter=sorter) # type: ignore[arg-type]
13301352

13311353

pandas/tests/series/methods/test_searchsorted.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,3 +75,9 @@ def test_searchsorted_dataframe_fail(self):
7575
msg = "Value must be 1-D array-like or scalar, DataFrame is not supported"
7676
with pytest.raises(ValueError, match=msg):
7777
ser.searchsorted(vals)
78+
79+
def test_searchsorted_incomparable_object_raises():
80+
# mixed int/str in object-dtype Series should raise, mirroring sort_values
81+
ser = Series([1, 2, "1"])
82+
with pytest.raises(TypeError):
83+
ser.searchsorted("1")

0 commit comments

Comments
 (0)