Skip to content
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 64 additions & 19 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ class providing the base-class of operations.
Pandas4Warning,
)
from pandas.util._decorators import (
Appender,
Substitution,
cache_readonly,
doc,
Expand Down Expand Up @@ -738,13 +737,67 @@ def pipe(
**kwargs: Any,
) -> T: ...

@Substitution(
klass="GroupBy",
examples=dedent(
"""\
>>> df = pd.DataFrame({'A': 'a b a b'.split(), 'B': [1, 2, 3, 4]})
def pipe(
self,
func: Callable[Concatenate[Self, P], T] | tuple[Callable[..., T], str],
*args: Any,
**kwargs: Any,
) -> T:
"""
Apply a ``func`` with arguments to this GroupBy object and return its result.

Use `.pipe` when you want to improve readability by chaining together
functions that expect Series, DataFrames, GroupBy or Resampler objects.
Instead of writing

>>> h = lambda x, arg2, arg3: x + 1 - arg2 * arg3
>>> g = lambda x, arg1: x * 5 / arg1
>>> f = lambda x: x**4
>>> df = pd.DataFrame([["a", 4], ["b", 5]], columns=["group", "value"])
>>> h(g(f(df.groupby("group")), arg1=1), arg2=2, arg3=3) # doctest: +SKIP

You can write

>>> (
... df.groupby("group").pipe(f).pipe(g, arg1=1).pipe(h, arg2=2, arg3=3)
... ) # doctest: +SKIP

which is much more readable.

Parameters
----------
func : callable or tuple of (callable, str)
Function to apply to this GroupBy object or, alternatively,
a `(callable, data_keyword)` tuple where `data_keyword` is a
string indicating the keyword of `callable` that expects the
GroupBy object.
*args : iterable, optional
Positional arguments passed into `func`.
**kwargs : dict, optional
A dictionary of keyword arguments passed into `func`.

Returns
-------
GroupBy
The return type of `func`.

See Also
--------
Series.pipe : Apply a function with arguments to a series.
DataFrame.pipe : Apply a function with arguments to a dataframe.
apply : Apply function to each group instead of to the
full GroupBy object.

Notes
-----
See more `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#piping-function-calls>`_

Examples
--------
>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
A B
A B
0 a 1
1 b 2
2 a 3
Comment on lines 799 to 803
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is wrong, the columns should be adjusted. Is pre-commit doing this with the ruff formater?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! I think this is a holdover from the contents of the generated docstring I printed to console.

Expand All @@ -753,20 +806,12 @@ def pipe(
To get the difference between each groups maximum and minimum value in one
pass, you can do

>>> df.groupby('A').pipe(lambda x: x.max() - x.min())
B
>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

@josquinlarsen josquinlarsen Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the example. Fixed alignment of A B to match columns

Thank you for the feedback!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to combine lines 810/811 (i.e. A B) but that was causing Code Checks/Docstrings to fail. I reverted to the two lines and it's passing now, but doesn't look right in the file. I've not been able to find other examples that follow a pattern like that to see what others have done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A B would be the right thing if they were both columns. But the output has an index named A and a column named B, what you have currently is correct.

df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
print(df.groupby("A").pipe(lambda x: x.max() - x.min()))
#    B
# A   
# a  2
# b  2

A
a 2
b 2"""
),
)
@Appender(_pipe_template)
def pipe(
self,
func: Callable[Concatenate[Self, P], T] | tuple[Callable[..., T], str],
*args: Any,
**kwargs: Any,
) -> T:
b 2
"""
return com.pipe(self, func, *args, **kwargs)

@final
Expand Down
Loading