pandas.groupby 的 group_keys 参数实际上是做什么的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38856583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:46:43  来源:igfitidea点击:

What does the group_keys argument to pandas.groupby actually do?

pythonpandas

提问by Paul

In pandas.DataFrame.groupby, there is an argument group_keys, which I gather is supposed to do something relating to how group keys are included in the dataframe subsets. According to the documentation:

在 中pandas.DataFrame.groupby,有一个参数group_keys,我收集它应该做一些与数据帧子集中如何包含组键相关的事情。根据文档:

group_keys: boolean, default True

When calling apply, add group keys to index to identify pieces

group_keys:布尔值,默认为 True

调用apply时,将组键添加到索引以识别碎片

However, I can't really find any examples where group_keysmakes an actual difference:

但是,我真的找不到任何group_keys有实际区别的例子:

import pandas as pd

df = pd.DataFrame([[0, 1, 3],
                   [3, 1, 1],
                   [3, 0, 0],
                   [2, 3, 3],
                   [2, 1, 0]], columns=list('xyz'))

gby = df.groupby('x')
gby_k = df.groupby('x', group_keys=False)

It doesn't make a difference in the output of apply:

它对以下输出没有影响apply

ap = gby.apply(pd.DataFrame.sum)
#    x  y  z
# x         
# 0  0  1  3
# 2  4  4  3
# 3  6  1  1

ap_k = gby_k.apply(pd.DataFrame.sum)
#    x  y  z
# x         
# 0  0  1  3
# 2  4  4  3
# 3  6  1  1

And even if you print out the grouped subsets as you go, the results are still identical:

即使您随时打印出分组的子集,结果仍然相同:

def printer_func(x):
    print(x)
    return x

print('gby')
print('--------------')
gby.apply(printer_func)
print('--------------')

print('gby_k')
print('--------------')
gby_k.apply(printer_func)
print('--------------')

# gby
# --------------
#    x  y  z
# 0  0  1  3
#    x  y  z
# 0  0  1  3
#    x  y  z
# 3  2  3  3
# 4  2  1  0
#    x  y  z
# 1  3  1  1
# 2  3  0  0
# --------------
# gby_k
# --------------
#    x  y  z
# 0  0  1  3
#    x  y  z
# 0  0  1  3
#    x  y  z
# 3  2  3  3
# 4  2  1  0
#    x  y  z
# 1  3  1  1
# 2  3  0  0
# --------------

I considered the possibility that the default argument is actually True, but switching group_keysto explicitly Falsedoesn't make a difference either. What exactly is this argument for?

我考虑了默认参数实际上是 的可能性True,但切换group_keys到显式False也没有任何区别。这个论点究竟是为了什么?

(Run on pandasversion 0.18.1)

(在pandas版本上运行0.18.1

Edit:I did find a way where group_keyschanges behavior, based on this answer:

编辑:group_keys根据这个答案, 我确实找到了一种改变行为的方法:

import pandas as pd
import numpy as np

row_idx = pd.MultiIndex.from_product(((0, 1), (2, 3, 4)))
d = pd.DataFrame([[4, 3], [1, 3], [1, 1], [2, 4], [0, 1], [4, 2]], index=row_idx)

df_n = d.groupby(level=0).apply(lambda x: x.nlargest(2, [0]))
#        0  1
# 0 0 2  4  3
#     3  1  3
# 1 1 4  4  2
#     2  2  4

df_k = d.groupby(level=0, group_keys=False).apply(lambda x: x.nlargest(2, [0]))

#      0  1
# 0 2  4  3
#   3  1  3
# 1 4  4  2
#   2  2  4

However, I'm still not clear on the intelligible principle behind what group_keysis supposed to do. This behavior does not seem intuitive based on @piRSquared's answer.

但是,我还没有背后的东西可以理解的原则明确的group_keys应该做的。根据@piRSquared的回答,这种行为似乎并不直观。

回答by Nickil Maveli

group_keysparameter in groupbycomes handy during applyoperations that creates an additional index column corresponding to the grouped columns[group_keys=True] and eliminates in the case[group_keys=False] especially during the case when trying to perform operations on individual columns.

group_keys参数 ingroupbyapply创建对应于分组列 [ group_keys=True]的附加索引列的操作期间派上用场,并在 case[ group_keys=False] 中消除,尤其是在尝试对单个列执行操作的情况下。

One such instance:

一个这样的例子:

In [21]: gby = df.groupby('x',group_keys=True).apply(lambda row: row['x'])

In [22]: gby
Out[22]: 
x   
0  0    0
2  3    2
   4    2
3  1    3
   2    3
Name: x, dtype: int64

In [23]: gby_k = df.groupby('x', group_keys=False).apply(lambda row: row['x'])

In [24]: gby_k
Out[24]: 
0    0
3    2
4    2
1    3
2    3
Name: x, dtype: int64

One of it's intended application could be to group by one of the levels of the hierarchy by converting it to a Multi-indexdataframe object.

它的预期应用程序之一可能是通过将其转换为Multi-index数据帧对象来按层次结构的一个级别进行分组。

In [27]: gby.groupby(level='x').sum()
Out[27]: 
x
0    0
2    4
3    6
Name: x, dtype: int64

回答by piRSquared

If you are passing a function that preserves an index, pandas tries to keep that information. But if you pass a function that removes all semblance of index information, group_keys=Trueallows you to keep that information.

如果您传递一个保留索引的函数,pandas 会尝试保留该信息。但是,如果您传递一个删除所有索引信息外观的函数,则group_keys=True允许您保留该信息。

Use this instead

改用这个

f = lambda df: df.reset_index(drop=True)

Then the different groupby

那么不同的 groupby

gby.apply(lambda df: df.reset_index(drop=True))

enter image description here

在此处输入图片说明

gby_k.apply(lambda df: df.reset_index(drop=True))

enter image description here

在此处输入图片说明