pandas 在应用函数pandas python中包含组名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32460593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Including the group name in the apply function pandas python
提问by user1129988
Is there away to specify to the groupby()call to use the group name in the apply()lambda function?
是否可以指定groupby()调用以在apply()lambda 函数中使用组名?
Similar to if I iterate through groups I can get the group key via the following tuple decomposition:
类似于如果我遍历组,我可以通过以下元组分解获得组键:
for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):
print group_name
...is there a way to also get the group name in the apply function, such as:
...有没有办法在应用函数中获取组名,例如:
temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)
How can I get the group name as an argument for the apply lambda function?
如何获取组名作为 apply lambda 函数的参数?
回答by EdChum
I think you should be able to use the nameattribute:
我认为您应该能够使用该name属性:
temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))
should work, example:
应该工作,例如:
In [132]:
df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})
df
Out[132]:
a b
0 a 0
1 a 1
2 b 2
3 c 3
4 c 4
5 c 5
In [134]:
df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))
name: a
subdf: a b
0 a 0
1 a 1
name: b
subdf: a b
2 b 2
name: c
subdf: a b
3 c 3
4 c 4
5 c 5
Out[134]:
Empty DataFrame
Columns: []
Index: []
回答by rapture
For those who came looking for an answer to the question:
对于那些来寻找问题答案的人:
Including the group name in the transformfunction pandas python
在转换函数pandas python中包含组名
and ended up in this thread, please read on.
并在此线程中结束,请继续阅读。
Given the following input:
给定以下输入:
df = pd.DataFrame(data={'col1': list('aabccc'),
'col2': np.arange(6),
'col3': np.arange(6)})
Data:
数据:
col1 col2 col3
0 a 0 0
1 a 1 1
2 b 2 2
3 c 3 3
4 c 4 4
5 c 5 5
We can access the group name (which is visible from the scope of the calling applyfunction) like this:
我们可以像这样访问组名(在调用apply函数的范围内可见):
df.groupby('a') \
.apply(lambda frame: frame \
.transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'b' else col))
Output:
输出:
col1 col2 col3
0 a 3 0
1 a 4 1
2 b 2 2
3 c 3 3
4 c 4 4
5 c 5 5
Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.
请注意,需要调用 apply 才能获得对子 pandas.core.frame.DataFrame(即 frame)的引用,该子元素包含相应子组的 name 属性。变换的参数(即 col)的 name 属性指的是列/系列名称。
Alternatively, one could also loop over the groups and then, within each group, over the columns:
或者,也可以遍历组,然后在每个组内遍历列:
for grp_name, sub_df in df.groupby('col1'):
for col in sub_df:
if grp_name == 'a' and col == 'col2':
df.loc[df.col1 == grp_name, col] = sub_df[col] + 3
My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.
我的用例非常罕见,这是实现我的目标的唯一方法(从 pandas v0.24.2 开始)。但是,我建议彻底探索 Pandas 文档,因为很可能有一个更简单的矢量化解决方案来解决您可能需要此构造的用途。

