Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 10 additions & 15 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6542,23 +6542,18 @@ def copy(self, deep: bool = True) -> Self:

When ``deep=False``, a new object will be created without copying
the calling object's data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa).
and index are copied). With Copy-on-Write enabled by default,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CoW can't be disabled, I think we should remove the "enabled by default" bit.

Suggested change
and index are copied). With Copy-on-Write enabled by default,
and index are copied). With Copy-on-Write,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the phase enabled by default

changes to the data of the original will *not* be reflected in the
shallow copy (and vice versa). The shallow copy uses a lazy (deferred)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true in terms of mutations at the e.g. Series level, but not array.

ser = pd.Series([1, 2, 3])
ser2 = ser.copy(deep=False)
ser.array[1] = 100
print(ser2)
# 0      1
# 1    100
# 2      3
# dtype: int64

I'm not sure if we want to highlight this, cc @jorisvandenbossche

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if we want to go into the details here, but in any case it would be good to slightly rephrase the "changes to the data of the original will not be reflected .." to "changes to the original will not be reflected .." (i.e. leave out the "the data of").

The above example currently also only works with .array, and not with eg .values (given the current state of #63099)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually done with the suggested changes! Please let me know if further improvement is needed...

copy mechanism that copies the data only when any changes to the
original or shallow copy are made, ensuring memory efficiency while
maintaining data integrity.

.. note::
The ``deep=False`` behaviour as described above will change
in pandas 3.0. `Copy-on-Write
<https://pandas.pydata.org/docs/dev/user_guide/copy_on_write.html>`__
will be enabled by default, which means that the "shallow" copy
is that is returned with ``deep=False`` will still avoid making
an eager copy, but changes to the data of the original will *no*
longer be reflected in the shallow copy (or vice versa). Instead,
it makes use of a lazy (deferred) copy mechanism that will copy
the data only when any changes to the original or shallow copy is
made.

You can already get the future behavior and improvements through
enabling copy on write ``pd.options.mode.copy_on_write = True``
In pandas versions prior to 3.0, the default behavior without
Copy-on-Write was different: changes to the data of the original
*were* reflected in the shallow copy (and vice versa). See the
:ref:`Copy-on-Write user guide <copy_on_write>` for more information.

Parameters
----------
Expand Down
Loading