pandas 在应用函数pandas python中包含组名

Question

提问by user1129988

Is there away to specify to the groupby()call to use the group name in the apply()lambda function?

是否可以指定groupby()调用以在apply()lambda 函数中使用组名？

Similar to if I iterate through groups I can get the group key via the following tuple decomposition:

类似于如果我遍历组，我可以通过以下元组分解获得组键：

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):
    print group_name

...is there a way to also get the group name in the apply function, such as:

...有没有办法在应用函数中获取组名，例如：

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)

How can I get the group name as an argument for the apply lambda function?

如何获取组名作为 apply lambda 函数的参数？

Answer 1

回答by EdChum

I think you should be able to use the nameattribute:

我认为您应该能够使用该name属性：

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

should work, example:

应该工作，例如：

In [132]:
df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})
df

Out[132]:
   a  b
0  a  0
1  a  1
2  b  2
3  c  3
4  c  4
5  c  5

In [134]:
df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))

name: a 
subdf:    a  b
0  a  0
1  a  1
name: b 
subdf:    a  b
2  b  2
name: c 
subdf:    a  b
3  c  3
4  c  4
5  c  5
Out[134]:
Empty DataFrame
Columns: []
Index: []

Answer 2

回答by rapture

For those who came looking for an answer to the question:

对于那些来寻找问题答案的人：

Including the group name in the transformfunction pandas python

在转换函数pandas python中包含组名

and ended up in this thread, please read on.

并在此线程中结束，请继续阅读。

Given the following input:

给定以下输入：

df = pd.DataFrame(data={'col1': list('aabccc'),
                        'col2': np.arange(6),
                        'col3': np.arange(6)})

Data:

数据：

    col1    col2    col3
0   a       0       0
1   a       1       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

We can access the group name (which is visible from the scope of the calling applyfunction) like this:

我们可以像这样访问组名（在调用apply函数的范围内可见）：

df.groupby('a') \
.apply(lambda frame: frame \
       .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'b' else col))

Output:

输出：

    col1    col2    col3
0   a       3       0
1   a       4       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

请注意，需要调用 apply 才能获得对子 pandas.core.frame.DataFrame（即 frame）的引用，该子元素包含相应子组的 name 属性。变换的参数（即 col）的 name 属性指的是列/系列名称。

Alternatively, one could also loop over the groups and then, within each group, over the columns:

或者，也可以遍历组，然后在每个组内遍历列：

for grp_name, sub_df in df.groupby('col1'):
    for col in sub_df:
        if grp_name == 'a' and col == 'col2':
            df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.

我的用例非常罕见，这是实现我的目标的唯一方法（从 pandas v0.24.2 开始）。但是，我建议彻底探索 Pandas 文档，因为很可能有一个更简单的矢量化解决方案来解决您可能需要此构造的用途。

pandas 在应用函数pandas python中包含组名

提问by user1129988

回答by EdChum

回答by rapture

相关推荐

最近更新

标签

pandas 在应用函数pandas python中包含组名

提问by user1129988

回答by EdChum

回答by rapture

相关推荐

pandas 使用熊猫将字符串拆分为数字和文本

使用 numpy/pandas 按时间戳合并时间序列数据

pandas to_sql 给出 unicode 解码错误

当我在 Pandas 中尝试 locale.atof 时，float' 对象没有属性 'replace' 是什么？

相关推荐

最近更新

标签