You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
python -c `'import pandas as pd; pd.test(extra_args=[`\"--no-strict-data-files`\", `\"-m not clipboard and not single_cpu and not slow and not network and not db`\"])`';
183
189
"@
184
190
# add rc to the end of the image name if the Python version is unreleased
With object dtype, using ``.values`` on a Series will return the underlying NumPy array.
334
+
335
+
.. code-block:: python
336
+
337
+
>>> ser = pd.Series(["a", "b", np.nan], dtype="object")
338
+
>>>type(ser.values)
339
+
<class'numpy.ndarray'>
340
+
341
+
However with the new string dtype, the underlying ExtensionArray is returned instead.
342
+
343
+
.. code-block:: python
344
+
345
+
>>> ser = pd.Series(["a", "b", pd.NA], dtype="str")
346
+
>>> ser.values
347
+
<ArrowStringArray>
348
+
['a', 'b', nan]
349
+
Length: 3, dtype: str
350
+
351
+
If your code requires a NumPy array, you should use :meth:`Series.to_numpy`.
352
+
353
+
.. code-block:: python
354
+
355
+
>>> ser = pd.Series(["a", "b", pd.NA], dtype="str")
356
+
>>> ser.to_numpy()
357
+
['a''b' nan]
358
+
359
+
In general, you should always prefer :meth:`Series.to_numpy` to get a NumPy array or :meth:`Series.array` to get an ExtensionArray over using :meth:`Series.values`.
360
+
310
361
Notable bug fixes
311
362
~~~~~~~~~~~~~~~~~
312
363
364
+
.. _string_migration_guide-astype_str:
365
+
313
366
``astype(str)`` preserving missing values
314
367
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315
368
316
-
This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
369
+
The stringifying of missing values is a long standing "bug" or misfeature, as
370
+
discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
371
+
introduces a significant behaviour change.
317
372
318
-
With pandas < 3, when using ``astype(str)`` (using the built-in :func:`str`, not
319
-
``astype("str")``!), the operation would convert every element to a string,
320
-
including the missing values:
373
+
With pandas < 3, when using ``astype(str)`` or ``astype("str")``, the operation
374
+
would convert every element to a string, including the missing values:
321
375
322
376
.. code-block:: python
323
377
324
378
# OLD behavior in pandas < 3
325
-
>>> ser = pd.Series(["a", np.nan], dtype=object)
379
+
>>> ser = pd.Series([1.5, np.nan])
326
380
>>> ser
327
-
0 a
381
+
01.5
328
382
1 NaN
329
-
dtype: object
330
-
>>> ser.astype(str)
331
-
0 a
383
+
dtype: float64
384
+
>>> ser.astype("str")
385
+
01.5
332
386
1 nan
333
387
dtype: object
334
-
>>> ser.astype(str).to_numpy()
335
-
array(['a', 'nan'], dtype=object)
388
+
>>> ser.astype("str").to_numpy()
389
+
array(['1.5', 'nan'], dtype=object)
336
390
337
391
Note how ``NaN`` (``np.nan``) was converted to the string ``"nan"``. This was
338
392
not the intended behavior, and it was inconsistent with how other dtypes handled
339
393
missing values.
340
394
341
-
With pandas 3, this behavior has been fixed, and now ``astype(str)`` is an alias
342
-
for ``astype("str")``, i.e. casting to the new string dtype, which will preserve
343
-
the missing values:
395
+
With pandas 3, this behavior has been fixed, and now ``astype("str")`` will cast
396
+
to the new string dtype, which preserves the missing values:
344
397
345
398
.. code-block:: python
346
399
347
400
# NEW behavior in pandas 3
348
401
>>> pd.options.future.infer_string =True
349
-
>>> ser = pd.Series(["a", np.nan], dtype=object)
350
-
>>> ser.astype(str)
351
-
0 a
402
+
>>> ser = pd.Series([1.5, np.nan])
403
+
>>> ser.astype("str")
404
+
01.5
352
405
1 NaN
353
406
dtype: str
354
-
>>> ser.astype(str).values
355
-
array(['a', nan], dtype=object)
407
+
>>> ser.astype("str").to_numpy()
408
+
array(['1.5', nan], dtype=object)
356
409
357
410
If you want to preserve the old behaviour of converting every object to a
358
-
string, you can use ``ser.map(str)`` instead.
411
+
string, you can use ``ser.map(str)`` instead. If you want do such conversion
412
+
while preserving the missing values in a way that works with both pandas 2.x and
413
+
3.x, you can use ``ser.map(str, na_action="ignore")`` (for pandas 3.x only, you
414
+
can do ``ser.astype("str")``).
415
+
416
+
If you want to convert to object or string dtype for pandas 2.x and 3.x,
417
+
respectively, without needing to stringify each individual element, you will
418
+
have to use a conditional check on the pandas version.
419
+
For example, to convert a categorical Series with string categories to its
420
+
dense non-categorical version with object or string dtype:
421
+
422
+
.. code-block:: python
423
+
424
+
>>>import pandas as pd
425
+
>>> ser = pd.Series(["a", np.nan], dtype="category")
Upsampling and calling ``.ohlc()`` previously returned a ``Series``, basically identical to calling ``.asfreq()``. OHLC upsampling now returns a DataFrame with columns ``open``, ``high``, ``low`` and ``close`` (:issue:`13083`). This is consistent with downsampling and ``DatetimeIndex`` behavior.
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v2.2.0.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -662,7 +662,7 @@ Other Deprecations
662
662
- Deprecated :meth:`DatetimeArray.__init__` and :meth:`TimedeltaArray.__init__`, use :func:`array` instead (:issue:`55623`)
663
663
- Deprecated :meth:`Index.format`, use ``index.astype(str)`` or ``index.map(formatter)`` instead (:issue:`55413`)
664
664
- Deprecated :meth:`Series.ravel`, the underlying array is already 1D, so ravel is not necessary (:issue:`52511`)
665
-
- Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with ``.to_timestamp()``) before resampling instead (:issue:`53481`)
665
+
- Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with ``.to_timestamp()``) before resampling instead (:issue:`53481`). Note: this deprecation was later undone in pandas 2.3.3 (:issue:`57033`)
666
666
- Deprecated :meth:`Series.view`, use :meth:`Series.astype` instead to change the dtype (:issue:`20251`)
667
667
- Deprecated :meth:`offsets.Tick.is_anchored`, use ``False`` instead (:issue:`55388`)
668
668
- Deprecated ``core.internals`` members ``Block``, ``ExtensionBlock``, and ``DatetimeTZBlock``, use public APIs instead (:issue:`55139`)
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v2.3.2.rst
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
.. _whatsnew_232:
2
2
3
-
What's new in 2.3.2 (August XX, 2025)
3
+
What's new in 2.3.2 (August 21, 2025)
4
4
-------------------------------------
5
5
6
6
These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog
@@ -25,10 +25,16 @@ Bug fixes
25
25
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
26
26
"string" type in the JSON Table Schema for :class:`StringDtype` columns
27
27
(:issue:`61889`)
28
-
28
+
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
29
+
- Fixed :meth:`~Series.str.match`, :meth:`~Series.str.fullmatch` and :meth:`~Series.str.contains`
30
+
string methods with compiled regex for the Arrow-backed string dtype (:issue:`61964`, :issue:`61942`)
31
+
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently
32
+
replacing matching values when missing values are present for string dtypes (:issue:`56599`)
0 commit comments