Python 使用带参数的 Pandas groupby() + apply()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43483365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use Pandas groupby() + apply() with arguments
提问by beta
I would like to use df.groupby()
in combination with apply()
to apply a function to each row per group.
我想df.groupby()
结合使用apply()
将函数应用于每组的每一行。
I normally use the following code, which usually works (note, that this is without groupby()
):
我通常使用以下代码,它通常有效(注意,这是没有groupby()
):
df.apply(myFunction, args=(arg1,))
With the groupby()
I tried the following:
随着groupby()
我试过如下:
df.groupby('columnName').apply(myFunction, args=(arg1,))
However, I get the following error:
但是,我收到以下错误:
TypeError: myFunction() got an unexpected keyword argument 'args'
类型错误:myFunction() 得到了一个意外的关键字参数“args”
Hence, my question is: How can I use groupby()
and apply()
with a function that needs arguments?
因此,我的问题是:如何使用groupby()
和apply()
需要参数的函数?
采纳答案by MaxU
pandas.core.groupby.GroupBy.apply
does NOT have namedparameter args
, but pandas.DataFrame.apply
does have it.
pandas.core.groupby.GroupBy.apply
没有命名参数args
,但pandas.DataFrame.apply
有它。
So try this:
所以试试这个:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero:
或者按照@Zero 的建议:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo:
演示:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
In [83]: df
Out[83]:
a b c
0 0 3 1
1 0 3 4
2 3 0 4
3 4 2 3
4 3 4 1
In [84]: def f(ser, n):
...: return ser.max() * n
...:
In [85]: df.apply(f, args=(10,))
Out[85]:
a 40
b 40
c 40
dtype: int64
when using GroupBy.apply
you can pass either a named arguments:
使用时,GroupBy.apply
您可以传递命名参数:
In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
a tuple of arguments:
一组参数:
In [87]: df.groupby('a').apply(f, (10))
Out[87]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
回答by Brad Solomon
Some confusion here over why using an args
parameter throws an error might stem from the fact that pandas.DataFrame.apply
does have an args
parameter (a tuple), while pandas.core.groupby.GroupBy.apply
does not.
关于为什么使用args
参数会引发错误的一些混淆可能源于pandas.DataFrame.apply
这样一个事实,即确实有一个args
参数(元组),而pandas.core.groupby.GroupBy.apply
没有。
So, when you call .apply
on a DataFrame itself, you can use this argument; when you call .apply
on a groupby object, you cannot.
因此,当您调用.apply
DataFrame 本身时,您可以使用此参数;当您调用.apply
groupby 对象时,您不能。
In @MaxU's answer, the expression lambda x: myFunction(x, arg1)
is passed to func
(the first parameter); there is no need to specify additional *args
/**kwargs
because arg1
is specified in lambda.
在@MaxU 的回答中,表达式lambda x: myFunction(x, arg1)
被传递给func
(第一个参数);不需要指定额外的*args
/**kwargs
因为arg1
是在 lambda 中指定的。
An example:
一个例子:
import numpy as np
import pandas as pd
# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0) # equiv to df.sum(0)
df.apply(np.sum, axis=1) # equiv to df.sum(1)
# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'
回答by Hitesh Somani
For me
为了我
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
worked
工作过