Python 如何计算 Pandas 中另一列分组的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30482071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:29:27  来源:igfitidea点击:

How to calculate mean values grouped on another column in Pandas

pythonpandasdataframe

提问by Rafa

For the following dataframe:

对于以下数据框:

StationID  HoursAhead    BiasTemp  
SS0279           0          10
SS0279           1          20
KEOPS            0          0
KEOPS            1          5
BB               0          5
BB               1          5

I'd like to get something like:

我想得到类似的东西:

StationID  BiasTemp  
SS0279     15
KEOPS      2.5
BB         5

I know I can script something like this to get the desired result:

我知道我可以编写这样的脚本以获得所需的结果:

def transform_DF(old_df,col):
    list_stations = list(set(old_df['StationID'].values.tolist()))
    header = list(old_df.columns.values)
    header.remove(col)
    header_new = header
    new_df = pandas.DataFrame(columns = header_new)
    for i,station in enumerate(list_stations):
        general_results = old_df[(old_df['StationID'] == station)].describe()
        new_row = []
        for column in header_new:
            if column in ['StationID']: 
                new_row.append(station)
                continue
            new_row.append(general_results[column]['mean'])
        new_df.loc[i] = new_row
    return new_df

But I wonder if there is something more straightforward in pandas.

但我想知道大熊猫是否有更直接的东西。

采纳答案by Zero

You could groupbyon StationIDand then take mean()on BiasTemp. To output Dataframe, use as_index=False

你可以先groupbyStationIDmean()BiasTemp。要输出Dataframe,请使用as_index=False

In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
  StationID  BiasTemp
0        BB       5.0
1     KEOPS       2.5
2    SS0279      15.0

Without as_index=False, it returns a Seriesinstead

如果没有as_index=False,它返回一个Series代替

In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB            5.0
KEOPS         2.5
SS0279       15.0
Name: BiasTemp, dtype: float64

Read more about groupbyin this pydata tutorial.

groupby在这个 pydata教程中阅读更多信息。

回答by EdChum

This is what groupbyis for:

groupby是为了:

In [117]:
df.groupby('StationID')['BiasTemp'].mean()

Out[117]:
StationID
BB         5.0
KEOPS      2.5
SS0279    15.0
Name: BiasTemp, dtype: float64

Here we groupby the 'StationID' column, we then access the 'BiasTemp' column and call meanon it

这里我们按“StationID”列分组,然后访问“BiasTemp”列并调用mean

There is a section in the docson this functionality.

有一个在一个部分文档这一功能。