Python Pandas:使用 groupby() 和 agg() 时是否保留顺序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26456125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: Is Order Preserved When Using groupby() and agg()?
提问by BringMyCakeBack
I've frequented used pandas' agg()function to run summary statistics on every column of a data.frame. For example, here's how you would produce the mean and standard deviation:
我经常使用agg()Pandas的函数对 data.frame 的每一列运行汇总统计。例如,以下是产生均值和标准差的方法:
df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
'B': [10, 12, 10, 25, 10, 12],
'C': [100, 102, 100, 250, 100, 102]})
>>> df
[output]
A B C
0 group1 10 100
1 group1 12 102
2 group2 10 100
3 group2 25 250
4 group3 10 100
5 group3 12 102
In both of those cases, the order that individual rows are sent to the agg function does not matter. But consider the following example, which:
在这两种情况下,将各个行发送到 agg 函数的顺序无关紧要。但请考虑以下示例,其中:
df.groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
[output]
mean <lambda> mean <lambda>
A
group1 11.0 12 101 102
group2 17.5 25 175 250
group3 11.0 12 101 102
In this case the lambda functions as intended, outputting the second row in each group. However, I have not been able to find anything in the pandas documentation that implies that this is guaranteed to be true in all cases. I want use agg()along with a weighted average function, so I want to be sure that the rows that come into the function will be in the same order as they appear in the original data frame.
在这种情况下,lambda 按预期运行,输出每组中的第二行。但是,我无法在 Pandas 文档中找到任何暗示这在所有情况下都是正确的。我想agg()与加权平均函数一起使用,所以我想确保进入函数的行的顺序与它们出现在原始数据框中的顺序相同。
Does anyone know, ideally via somewhere in the docs or pandas source code, if this is guaranteed to be the case?
有谁知道,理想情况下是通过文档或 Pandas 源代码中的某个地方,如果保证确实如此?
采纳答案by Jeff
See this enhancement issue
看到这个增强问题
The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:
简短的回答是肯定的,groupby 将保留传入的顺序。您可以使用这样的示例来证明这一点:
In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
Out[20]:
B C
mean <lambda> mean <lambda>
A
group1 11.0 10 101 100
group2 17.5 10 175 100
group3 11.0 10 101 100
This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).
然而,这不适用于重新采样,因为它需要一个单调索引(它将与非单调索引一起使用,但会首先对其进行排序)。
Their is a sort=flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.
他们是sort=groupby的标志,但这与组本身的排序有关,而不是组内的观察。
FYI: df.groupby('A').nth(1)is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)
仅供参考:df.groupby('A').nth(1)是获取组的第二个值的安全方法(因为如果组具有 < 2 个元素,则上述方法将失败)
回答by Uwe Mayer
Panda's 0.19.1 doc says "groupby preserves the order of rows within each group", so this is guaranteed behavior.
Panda 的 0.19.1 文档说“groupby 保留每个组中的行顺序”,因此这是有保证的行为。
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html
回答by Dima Lituiev
In order to preserve order, you'll need to pass .groupby(..., sort=False). In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=Falseflag:
为了保持顺序,您需要通过.groupby(..., sort=False). 在您的情况下,分组列已经排序,因此没有区别,但通常必须使用该sort=False标志:
df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])
回答by Jigidi Sarnath
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
参考:https: //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
The API accepts "SORT" as an argument.
API 接受“SORT”作为参数。
Description for SORT argument is like this:
SORT 参数的描述是这样的:
sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
sort : bool,默认 True Sort 组键。关闭此功能可获得更好的性能。请注意,这不会影响每个组内的观察顺序。Groupby 保留每个组中行的顺序。
Thus, it is clear the "Groupby" does preserve the order of rows within each group.
因此,很明显“Groupby”确实保留了每个组中行的顺序。
回答by TinaW
Even easier:
更简单:
import pandas as pd
pd.pivot_table(df,index='A',aggfunc=(np.mean))
output:
输出:
B C
A
group1 11.0 101
group2 17.5 175
group3 11.0 101

