Python 如何计算 Pandas 中另一列分组的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30482071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to calculate mean values grouped on another column in Pandas
提问by Rafa
For the following dataframe:
对于以下数据框:
StationID HoursAhead BiasTemp
SS0279 0 10
SS0279 1 20
KEOPS 0 0
KEOPS 1 5
BB 0 5
BB 1 5
I'd like to get something like:
我想得到类似的东西:
StationID BiasTemp
SS0279 15
KEOPS 2.5
BB 5
I know I can script something like this to get the desired result:
我知道我可以编写这样的脚本以获得所需的结果:
def transform_DF(old_df,col):
list_stations = list(set(old_df['StationID'].values.tolist()))
header = list(old_df.columns.values)
header.remove(col)
header_new = header
new_df = pandas.DataFrame(columns = header_new)
for i,station in enumerate(list_stations):
general_results = old_df[(old_df['StationID'] == station)].describe()
new_row = []
for column in header_new:
if column in ['StationID']:
new_row.append(station)
continue
new_row.append(general_results[column]['mean'])
new_df.loc[i] = new_row
return new_df
But I wonder if there is something more straightforward in pandas.
但我想知道大熊猫是否有更直接的东西。
采纳答案by Zero
You could groupby
on StationID
and then take mean()
on BiasTemp
. To output Dataframe
, use as_index=False
你可以先groupby
上StationID
再mean()
上BiasTemp
。要输出Dataframe
,请使用as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0
Without as_index=False
, it returns a Series
instead
如果没有as_index=False
,它返回一个Series
代替
In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Read more about groupby
in this pydata tutorial.
groupby
在这个 pydata教程中阅读更多信息。
回答by EdChum
This is what groupby
is for:
这groupby
是为了:
In [117]:
df.groupby('StationID')['BiasTemp'].mean()
Out[117]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Here we groupby the 'StationID' column, we then access the 'BiasTemp' column and call mean
on it
这里我们按“StationID”列分组,然后访问“BiasTemp”列并调用mean
它
There is a section in the docson this functionality.
有一个在一个部分文档这一功能。