将函数或 Lambda 应用于 Pandas GROUPBY
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48183635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apply Function or Lambda to Pandas GROUPBY
提问by Kyle
I would like to apply a specific function (in this case a logit model) to a dataframe which can be grouped (by the variable "model"). I know the task can be performed through a loop, however I believe this to be inefficient at best. Example code below:
我想将特定函数(在本例中为 logit 模型)应用于可以分组的数据帧(通过变量“模型”)。我知道可以通过循环执行任务,但是我认为这充其量是低效的。示例代码如下:
import pandas as pd
import numpy as np
import statsmodels.api as sm
df1=pd.DataFrame(np.random.randint(0,100,size=(100,10)),columns=list('abcdefghij'))
df2=pd.DataFrame(np.random.randint(0,100,size=(100,10)),columns=list('abcdefghij'))
df1['model']=1
df1['target']=np.random.randint(2,size=100)
df2['model']=2
df2['target']=np.random.randint(2,size=100)
data=pd.concat([df1,df2])
### Clunky, but works...
for i in range(1,2+1):
lm=sm.Logit(data[data['model']==i]['target'],
sm.add_constant(data[data['model']==i].drop(['target'],axis=1))).fit(disp=0)
print(lm.summary2())
### Can this work?
def elegant(self):
lm=sm.Logit(data['target'],
sm.add_constant(data.drop(['target'],axis=1))).fit(disp=0)
better=data.groupby(['model']).apply(elegant)
If the above groupby can work, is this a more efficient way to perform than looping?
如果上述 groupby 可以工作,这是比循环更有效的执行方式吗?
回答by Prikers
This could work:
这可以工作:
def elegant(df):
lm = sm.Logit(df['target'],
sm.add_constant(df.drop(['target'],axis=1))).fit(disp=0)
return lm
better = data.groupby('model').apply(elegant)
Using .apply
you passe the dataframe groups to the function elegant
so elegant
has to take a dataframe as the first argument here. Also your function needs to return the result of your calculation lm
.
使用.apply
您将数据帧组传递给函数,elegant
因此elegant
必须将数据帧作为此处的第一个参数。此外,您的函数需要返回计算结果lm
。
For more complexe functions the following structure can be used:
对于更复杂的功能,可以使用以下结构:
def some_fun(df, kw_param=1):
# some calculations to df using kw_param
return df
better = data.groupby('model').apply(lambda group: some_func(group, kw_param=99))