Python 使用带参数的 Pandas groupby() + apply()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43483365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use Pandas groupby() + apply() with arguments
提问by beta
I would like to use df.groupby()in combination with apply()to apply a function to each row per group.
我想df.groupby()结合使用apply()将函数应用于每组的每一行。
I normally use the following code, which usually works (note, that this is without groupby()):
我通常使用以下代码,它通常有效(注意,这是没有groupby()):
df.apply(myFunction, args=(arg1,))
With the groupby()I tried the following:
随着groupby()我试过如下:
df.groupby('columnName').apply(myFunction, args=(arg1,))
However, I get the following error:
但是,我收到以下错误:
TypeError: myFunction() got an unexpected keyword argument 'args'
类型错误:myFunction() 得到了一个意外的关键字参数“args”
Hence, my question is: How can I use groupby()and apply()with a function that needs arguments?
因此,我的问题是:如何使用groupby()和apply()需要参数的函数?
采纳答案by MaxU
pandas.core.groupby.GroupBy.applydoes NOT have namedparameter args, but pandas.DataFrame.applydoes have it.
pandas.core.groupby.GroupBy.apply没有命名参数args,但pandas.DataFrame.apply有它。
So try this:
所以试试这个:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero:
或者按照@Zero 的建议:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo:
演示:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
In [83]: df
Out[83]:
a b c
0 0 3 1
1 0 3 4
2 3 0 4
3 4 2 3
4 3 4 1
In [84]: def f(ser, n):
...: return ser.max() * n
...:
In [85]: df.apply(f, args=(10,))
Out[85]:
a 40
b 40
c 40
dtype: int64
when using GroupBy.applyyou can pass either a named arguments:
使用时,GroupBy.apply您可以传递命名参数:
In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
a tuple of arguments:
一组参数:
In [87]: df.groupby('a').apply(f, (10))
Out[87]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
回答by Brad Solomon
Some confusion here over why using an argsparameter throws an error might stem from the fact that pandas.DataFrame.applydoes have an argsparameter (a tuple), while pandas.core.groupby.GroupBy.applydoes not.
关于为什么使用args参数会引发错误的一些混淆可能源于pandas.DataFrame.apply这样一个事实,即确实有一个args参数(元组),而pandas.core.groupby.GroupBy.apply没有。
So, when you call .applyon a DataFrame itself, you can use this argument; when you call .applyon a groupby object, you cannot.
因此,当您调用.applyDataFrame 本身时,您可以使用此参数;当您调用.applygroupby 对象时,您不能。
In @MaxU's answer, the expression lambda x: myFunction(x, arg1)is passed to func(the first parameter); there is no need to specify additional *args/**kwargsbecause arg1is specified in lambda.
在@MaxU 的回答中,表达式lambda x: myFunction(x, arg1)被传递给func(第一个参数);不需要指定额外的*args/**kwargs因为arg1是在 lambda 中指定的。
An example:
一个例子:
import numpy as np
import pandas as pd
# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0) # equiv to df.sum(0)
df.apply(np.sum, axis=1) # equiv to df.sum(1)
# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'
回答by Hitesh Somani
For me
为了我
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
worked
工作过

