Pandas 数据框:按两列分组,然后对另一列求平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35587459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe: Group by two columns and then average over another column
提问by ahajib
Assuming that I have a dataframe with the following values:
假设我有一个具有以下值的数据框:
df:
col1 col2 value
1 2 3
1 2 1
2 3 1
I want to first groupby my dataframe based on the first two columns (col1 and col2) and then average over values of the thirs column (value). So the desired output would look like this:
我想首先根据前两列(col1 和 col2)对我的数据框进行分组,然后对第三列的值(值)求平均值。因此,所需的输出如下所示:
col1 col2 avg-value
1 2 2
2 3 1
I am using the following code:
我正在使用以下代码:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby('col1','col2').mean())
which gets the following error:
得到以下错误:
ValueError: No axis named col2 for object type <class 'pandas.core.frame.DataFrame'>
Any help would be much appreciated.
任何帮助将非常感激。
采纳答案by EdChum
You need to pass a list of the columns to groupby, what you passed was interpreted as the axis
param which is why it raised an error:
您需要将列列表传递给 groupby,您传递的内容被解释为axis
参数,这就是它引发错误的原因:
In [30]:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby(['col1','col2']).mean())
avg
col1 col2
1 2 3
3 3
回答by jkokorian
If you want to group by multiple columns, you should put them in a list:
如果你想按多列分组,你应该把它们放在一个列表中:
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).mean())
Or slightly more verbose, for the sake of getting the word 'avg' in your aggregated dataframe:
或者稍微详细一点,为了在聚合数据框中获得“avg”这个词:
import numpy as np
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).agg({'value': {'avg': np.mean}}))