Skip to content

Commit 7edc7cc

Browse files
committed
Merge branch '2.3.x' of https://github.com/pandas-dev/pandas into 2.3.x
2 parents b1c7e02 + 472bae0 commit 7edc7cc

File tree

80 files changed

+1712
-427
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+1712
-427
lines changed

.github/workflows/unit-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -392,7 +392,7 @@ jobs:
392392
- name: Set up Python Dev Version
393393
uses: actions/setup-python@v5
394394
with:
395-
python-version: '3.13-dev'
395+
python-version: '3.14-dev'
396396

397397
- name: Build Environment
398398
run: |

.github/workflows/wheels.yml

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
name: Wheel builder
1414

1515
on:
16+
release:
17+
types: [published]
1618
schedule:
1719
# 3:27 UTC every day
1820
- cron: "27 3 * * *"
@@ -37,6 +39,7 @@ jobs:
3739
if: >-
3840
(github.event_name == 'schedule') ||
3941
github.event_name == 'workflow_dispatch' ||
42+
github.event_name == 'release' ||
4043
(github.event_name == 'pull_request' &&
4144
contains(github.event.pull_request.labels.*.name, 'Build')) ||
4245
(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') && ( ! endsWith(github.ref, 'dev0')))
@@ -82,6 +85,7 @@ jobs:
8285
if: >-
8386
(github.event_name == 'schedule') ||
8487
github.event_name == 'workflow_dispatch' ||
88+
github.event_name == 'release' ||
8589
(github.event_name == 'pull_request' &&
8690
contains(github.event.pull_request.labels.*.name, 'Build')) ||
8791
(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') && ( ! endsWith(github.ref, 'dev0')))
@@ -101,11 +105,13 @@ jobs:
101105
- [macos-14, macosx_arm64]
102106
- [windows-2022, win_amd64]
103107
# TODO: support PyPy?
104-
python: [["cp39", "3.9"], ["cp310", "3.10"], ["cp311", "3.11"], ["cp312", "3.12"], ["cp313", "3.13"], ["cp313t", "3.13"]]
108+
python: [["cp39", "3.9"], ["cp310", "3.10"], ["cp311", "3.11"], ["cp312", "3.12"], ["cp313", "3.13"], ["cp313t", "3.13"], ["cp314", "3.14"], ["cp314t", "3.14"]]
105109
# TODO: Build free-threaded wheels for Windows
106110
exclude:
107111
- buildplat: [windows-2022, win_amd64]
108112
python: ["cp313t", "3.13"]
113+
- buildplat: [windows-2022, win_amd64]
114+
python: ["cp314t", "3.14"]
109115

110116
env:
111117
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
@@ -147,7 +153,7 @@ jobs:
147153
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
148154

149155
- name: Build wheels
150-
uses: pypa/cibuildwheel@v2.23.3
156+
uses: pypa/cibuildwheel@v3.1.4
151157
with:
152158
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
153159
env:
@@ -182,8 +188,8 @@ jobs:
182188
python -c `'import pandas as pd; pd.test(extra_args=[`\"--no-strict-data-files`\", `\"-m not clipboard and not single_cpu and not slow and not network and not db`\"])`';
183189
"@
184190
# add rc to the end of the image name if the Python version is unreleased
185-
docker pull python:${{ matrix.python[1] == '3.13' && '3.13-rc' || format('{0}-windowsservercore', matrix.python[1]) }}
186-
docker run --env PANDAS_CI='1' -v ${PWD}:C:\pandas python:${{ matrix.python[1] == '3.13' && '3.13-rc' || format('{0}-windowsservercore', matrix.python[1]) }} powershell -Command $TST_CMD
191+
docker pull python:${{ matrix.python[1] == '3.14' && '3.14-rc' || format('{0}-windowsservercore', matrix.python[1]) }}
192+
docker run --env PANDAS_CI='1' -v ${PWD}:C:\pandas python:${{ matrix.python[1] == '3.14' && '3.14-rc' || format('{0}-windowsservercore', matrix.python[1]) }} powershell -Command $TST_CMD
187193
188194
- uses: actions/upload-artifact@v4
189195
with:
@@ -206,3 +212,41 @@ jobs:
206212
source ci/upload_wheels.sh
207213
set_upload_vars
208214
upload_wheels
215+
216+
publish:
217+
if: >
218+
github.repository == 'pandas-dev/pandas' &&
219+
github.event_name == 'release' &&
220+
startsWith(github.ref, 'refs/tags/v')
221+
222+
needs:
223+
- build_sdist
224+
- build_wheels
225+
226+
runs-on: ubuntu-latest
227+
228+
environment:
229+
name: pypi
230+
permissions:
231+
id-token: write # OIDC for Trusted Publishing
232+
contents: read
233+
234+
steps:
235+
- name: Download all artefacts
236+
uses: actions/download-artifact@v4
237+
with:
238+
path: dist # everything lands in ./dist/**
239+
240+
- name: Collect files
241+
run: |
242+
mkdir -p upload
243+
# skip any wheel that contains 'pyodide'
244+
find dist -name '*pyodide*.whl' -prune -o \
245+
-name '*.whl' -exec mv {} upload/ \;
246+
find dist -name '*.tar.gz' -exec mv {} upload/ \;
247+
248+
- name: Publish to **PyPI** (Trusted Publishing)
249+
uses: pypa/gh-action-pypi-publish@release/v1
250+
with:
251+
packages-dir: upload
252+
skip-existing: true

doc/source/development/maintaining.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -467,9 +467,10 @@ which will be triggered when the tag is pushed.
467467
- Set as the latest release: Leave checked, unless releasing a patch release for an older version
468468
(e.g. releasing 1.4.5 after 1.5 has been released)
469469

470-
5. Upload wheels to PyPI::
471-
472-
twine upload pandas/dist/pandas-<version>*.{whl,tar.gz} --skip-existing
470+
5. Verify wheels are uploaded automatically by GitHub Actions
471+
via `**Trusted Publishing** <https://docs.pypi.org/trusted-publishers/>`__
472+
when the GitHub `*Release* <https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases>`__
473+
is published. Do not run ``twine upload`` manually.
473474

474475
6. The GitHub release will after some hours trigger an
475476
`automated conda-forge PR <https://github.com/conda-forge/pandas-feedstock/pulls>`_.

doc/source/user_guide/migration-3-strings.rst

Lines changed: 88 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,26 @@ let pandas do the inference. But if you want to be specific, you can specify the
188188
This is actually compatible with pandas 2.x as well, since in pandas < 3,
189189
``dtype="str"`` was essentially treated as an alias for object dtype.
190190

191+
.. attention::
192+
193+
While using ``dtype="str"`` in constructors is compatible with pandas 2.x,
194+
specifying it as the dtype in :meth:`~Series.astype` runs into the issue
195+
of also stringifying missing values in pandas 2.x. See the section
196+
:ref:`string_migration_guide-astype_str` for more details.
197+
198+
For selecting string columns with :meth:`~DataFrame.select_dtypes` in a pandas
199+
2.x and 3.x compatible way, it is not possible to use ``"str"``. While this
200+
works for pandas 3.x, it raises an error in pandas 2.x.
201+
As an alternative, you can select both ``object`` (for pandas 2.x) and
202+
``"string"`` (for pandas 3.x; which will also select the default ``str`` dtype
203+
and does not error on pandas 2.x):
204+
205+
.. code-block:: python
206+
207+
# can use ``include=["str"]`` for pandas >= 3
208+
>>> df.select_dtypes(include=["object", "string"])
209+
210+
191211
The missing value sentinel is now always NaN
192212
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193213

@@ -307,55 +327,103 @@ the :meth:`~pandas.Series.str.decode` method now has a ``dtype`` parameter to be
307327
able to specify object dtype instead of the default of string dtype for this use
308328
case.
309329

330+
:meth:`Series.values` now returns an :class:`~pandas.api.extensions.ExtensionArray`
331+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332+
333+
With object dtype, using ``.values`` on a Series will return the underlying NumPy array.
334+
335+
.. code-block:: python
336+
337+
>>> ser = pd.Series(["a", "b", np.nan], dtype="object")
338+
>>> type(ser.values)
339+
<class 'numpy.ndarray'>
340+
341+
However with the new string dtype, the underlying ExtensionArray is returned instead.
342+
343+
.. code-block:: python
344+
345+
>>> ser = pd.Series(["a", "b", pd.NA], dtype="str")
346+
>>> ser.values
347+
<ArrowStringArray>
348+
['a', 'b', nan]
349+
Length: 3, dtype: str
350+
351+
If your code requires a NumPy array, you should use :meth:`Series.to_numpy`.
352+
353+
.. code-block:: python
354+
355+
>>> ser = pd.Series(["a", "b", pd.NA], dtype="str")
356+
>>> ser.to_numpy()
357+
['a' 'b' nan]
358+
359+
In general, you should always prefer :meth:`Series.to_numpy` to get a NumPy array or :meth:`Series.array` to get an ExtensionArray over using :meth:`Series.values`.
360+
310361
Notable bug fixes
311362
~~~~~~~~~~~~~~~~~
312363

364+
.. _string_migration_guide-astype_str:
365+
313366
``astype(str)`` preserving missing values
314367
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315368

316-
This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
369+
The stringifying of missing values is a long standing "bug" or misfeature, as
370+
discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
371+
introduces a significant behaviour change.
317372

318-
With pandas < 3, when using ``astype(str)`` (using the built-in :func:`str`, not
319-
``astype("str")``!), the operation would convert every element to a string,
320-
including the missing values:
373+
With pandas < 3, when using ``astype(str)`` or ``astype("str")``, the operation
374+
would convert every element to a string, including the missing values:
321375

322376
.. code-block:: python
323377
324378
# OLD behavior in pandas < 3
325-
>>> ser = pd.Series(["a", np.nan], dtype=object)
379+
>>> ser = pd.Series([1.5, np.nan])
326380
>>> ser
327-
0 a
381+
0 1.5
328382
1 NaN
329-
dtype: object
330-
>>> ser.astype(str)
331-
0 a
383+
dtype: float64
384+
>>> ser.astype("str")
385+
0 1.5
332386
1 nan
333387
dtype: object
334-
>>> ser.astype(str).to_numpy()
335-
array(['a', 'nan'], dtype=object)
388+
>>> ser.astype("str").to_numpy()
389+
array(['1.5', 'nan'], dtype=object)
336390
337391
Note how ``NaN`` (``np.nan``) was converted to the string ``"nan"``. This was
338392
not the intended behavior, and it was inconsistent with how other dtypes handled
339393
missing values.
340394

341-
With pandas 3, this behavior has been fixed, and now ``astype(str)`` is an alias
342-
for ``astype("str")``, i.e. casting to the new string dtype, which will preserve
343-
the missing values:
395+
With pandas 3, this behavior has been fixed, and now ``astype("str")`` will cast
396+
to the new string dtype, which preserves the missing values:
344397

345398
.. code-block:: python
346399
347400
# NEW behavior in pandas 3
348401
>>> pd.options.future.infer_string = True
349-
>>> ser = pd.Series(["a", np.nan], dtype=object)
350-
>>> ser.astype(str)
351-
0 a
402+
>>> ser = pd.Series([1.5, np.nan])
403+
>>> ser.astype("str")
404+
0 1.5
352405
1 NaN
353406
dtype: str
354-
>>> ser.astype(str).values
355-
array(['a', nan], dtype=object)
407+
>>> ser.astype("str").to_numpy()
408+
array(['1.5', nan], dtype=object)
356409
357410
If you want to preserve the old behaviour of converting every object to a
358-
string, you can use ``ser.map(str)`` instead.
411+
string, you can use ``ser.map(str)`` instead. If you want do such conversion
412+
while preserving the missing values in a way that works with both pandas 2.x and
413+
3.x, you can use ``ser.map(str, na_action="ignore")`` (for pandas 3.x only, you
414+
can do ``ser.astype("str")``).
415+
416+
If you want to convert to object or string dtype for pandas 2.x and 3.x,
417+
respectively, without needing to stringify each individual element, you will
418+
have to use a conditional check on the pandas version.
419+
For example, to convert a categorical Series with string categories to its
420+
dense non-categorical version with object or string dtype:
421+
422+
.. code-block:: python
423+
424+
>>> import pandas as pd
425+
>>> ser = pd.Series(["a", np.nan], dtype="category")
426+
>>> ser.astype(object if pd.__version__ < "3" else "str")
359427
360428
361429
``prod()`` raising for string data

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Version 2.3
1616
.. toctree::
1717
:maxdepth: 2
1818

19+
v2.3.3
1920
v2.3.2
2021
v2.3.1
2122
v2.3.0

doc/source/whatsnew/v0.21.0.rst

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -635,22 +635,17 @@ Previous behavior:
635635
636636
New behavior:
637637

638-
.. code-block:: ipython
638+
.. ipython:: python
639639
640-
In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')
640+
pi = pd.period_range('2017-01', periods=12, freq='M')
641641
642-
In [2]: s = pd.Series(np.arange(12), index=pi)
642+
s = pd.Series(np.arange(12), index=pi)
643643
644-
In [3]: resampled = s.resample('2Q').mean()
644+
resampled = s.resample('2Q').mean()
645645
646-
In [4]: resampled
647-
Out[4]:
648-
2017Q1 2.5
649-
2017Q3 8.5
650-
Freq: 2Q-DEC, dtype: float64
646+
resampled
651647
652-
In [5]: resampled.index
653-
Out[5]: PeriodIndex(['2017Q1', '2017Q3'], dtype='period[2Q-DEC]')
648+
resampled.index
654649
655650
Upsampling and calling ``.ohlc()`` previously returned a ``Series``, basically identical to calling ``.asfreq()``. OHLC upsampling now returns a DataFrame with columns ``open``, ``high``, ``low`` and ``close`` (:issue:`13083`). This is consistent with downsampling and ``DatetimeIndex`` behavior.
656651

doc/source/whatsnew/v2.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -662,7 +662,7 @@ Other Deprecations
662662
- Deprecated :meth:`DatetimeArray.__init__` and :meth:`TimedeltaArray.__init__`, use :func:`array` instead (:issue:`55623`)
663663
- Deprecated :meth:`Index.format`, use ``index.astype(str)`` or ``index.map(formatter)`` instead (:issue:`55413`)
664664
- Deprecated :meth:`Series.ravel`, the underlying array is already 1D, so ravel is not necessary (:issue:`52511`)
665-
- Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with ``.to_timestamp()``) before resampling instead (:issue:`53481`)
665+
- Deprecated :meth:`Series.resample` and :meth:`DataFrame.resample` with a :class:`PeriodIndex` (and the 'convention' keyword), convert to :class:`DatetimeIndex` (with ``.to_timestamp()``) before resampling instead (:issue:`53481`). Note: this deprecation was later undone in pandas 2.3.3 (:issue:`57033`)
666666
- Deprecated :meth:`Series.view`, use :meth:`Series.astype` instead to change the dtype (:issue:`20251`)
667667
- Deprecated :meth:`offsets.Tick.is_anchored`, use ``False`` instead (:issue:`55388`)
668668
- Deprecated ``core.internals`` members ``Block``, ``ExtensionBlock``, and ``DatetimeTZBlock``, use public APIs instead (:issue:`55139`)

doc/source/whatsnew/v2.3.1.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,4 +73,4 @@ Bug fixes
7373
Contributors
7474
~~~~~~~~~~~~
7575

76-
.. contributors:: v2.3.0..v2.3.1|HEAD
76+
.. contributors:: v2.3.0..v2.3.1

doc/source/whatsnew/v2.3.2.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_232:
22

3-
What's new in 2.3.2 (August XX, 2025)
3+
What's new in 2.3.2 (August 21, 2025)
44
-------------------------------------
55

66
These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog
@@ -25,10 +25,16 @@ Bug fixes
2525
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2626
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2727
(:issue:`61889`)
28-
28+
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
29+
- Fixed :meth:`~Series.str.match`, :meth:`~Series.str.fullmatch` and :meth:`~Series.str.contains`
30+
string methods with compiled regex for the Arrow-backed string dtype (:issue:`61964`, :issue:`61942`)
31+
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently
32+
replacing matching values when missing values are present for string dtypes (:issue:`56599`)
2933

3034
.. ---------------------------------------------------------------------------
3135
.. _whatsnew_232.contributors:
3236

3337
Contributors
3438
~~~~~~~~~~~~
39+
40+
.. contributors:: v2.3.1..v2.3.2

0 commit comments

Comments
 (0)