pandas 熊猫:返回多列的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49560809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:23:30  来源:igfitidea点击:

pandas: return average of multiple columns

pythonpandasgroup-by

提问by Karma

How do you output average of multiple columns?

你如何输出多列的平均值?

Gender   Age     Salary     Yr_exp   cup_coffee_daily
  Male    28    45000.0        6.0                2.0
Female    40    70000.0       15.0               10.0
Female    23    40000.0        1.0                0.0
  Male    35    55000.0       12.0                6.0

I have df.groupby('Gender', as_index=False)['Age', 'Salary', 'Yr_exp'].mean(), but it still only returned the average of the first column Age. How do you return the average of specific columns in different columns? Desired output:

我有df.groupby('Gender', as_index=False)['Age', 'Salary', 'Yr_exp'].mean(),但它仍然只返回第一列的平均值Age。你如何返回不同列中特定列的平均值?期望的输出:

Gender   Age     Salary   Yr_exp
  Male  31.5    50000.0      9.0
Female  31.5    55000.0      8.0

Thanks.

谢谢。

回答by Jonathan Dayton

Given this dataframe:

鉴于此数据框:

df = pd.DataFrame({
    "Gender": ["Male", "Female", "Female", "Male"],
    "Age": [28, 40, 23, 35],
    "Salary": [45000, 70000, 40000, 55000],
    "Yr_exp": [6, 15, 1, 12]
})

df
   Age  Gender  Salary  Yr_exp
0   28    Male   45000       6
1   40  Female   70000      15
2   23  Female   40000       1
3   35    Male   55000      12

Group by gender and use the mean()function:

按性别分组并使用mean()功能:

df.groupby("Gender").mean()
         Age   Salary  Yr_exp
Gender                       
Female  31.5  55000.0     8.0
Male    31.5  50000.0     9.0

Edit: you may need to change the way you're indexing after groupby(): df['Age', 'Salary']gives a KeyError, but df[['Age', 'Salary']]returns the expected:

编辑:您可能需要改变你的索引后的方式groupby()df['Age', 'Salary']给一个KeyError,但df[['Age', 'Salary']]返回预期:

   Age  Salary
0   28   45000
1   40   70000
2   23   40000
3   35   55000

Try changing

尝试改变

df.groupby("Gender", as_index=True)['Age', 'Salary', 'Yr_exp'].mean() 

to

df.groupby("Gender", as_index=True)[['Age', 'Salary', 'Yr_exp']].mean()

回答by VnC

You can also use pandas.agg():

您还可以使用pandas.agg()

df.groupby("Gender").agg({'Age' : 'mean', 'Salary' : 'mean', 'Yr_exp': 'mean'})

Would result to:

将导致:

         Age    Salary  Yr_exp
Gender          
Female  31.5    55000   8
Male    31.5    50000   9

Using .agg()give you the chance to apply different functions to a grouped object - something like:

使用.agg()使您有机会将不同的功能应用于分组对象 - 例如:

df.groupby("Gender").agg({'Age' : 'mean', 'Salary' : ['min', 'max'], 'Yr_exp': 'sum'})

Outputs:

输出:

          Age         Salary    Yr_exp
         mean    min      max   sum
Gender              
Female  31.5    40000   70000   16
Male    31.5    45000   55000   18