Python 使用熊猫/数据框计算加权平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26205922/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate weighted average using a pandas/dataframe
提问by mike01010
I have the following table. I want to calculate a weighted average grouped by each date based on the formula below. I can do this using some standard conventional code, but assuming that this data is in a pandas dataframe, is there any easier way to achieve this rather than through iteration?
我有下表。我想根据以下公式计算按每个日期分组的加权平均值。我可以使用一些标准的常规代码来做到这一点,但假设这些数据在熊猫数据框中,有没有比通过迭代更简单的方法来实现这一点?
Date ID wt value w_avg
01/01/2012 100 0.50 60 0.791666667
01/01/2012 101 0.75 80
01/01/2012 102 1.00 100
01/02/2012 201 0.50 100 0.722222222
01/02/2012 202 1.00 80
01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))
01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))
01/01/2012 w_avg = 0.5 * ( 60/ sum(60,80,100)) + .75 * (80/ sum(60,80,100)) + 1.0 * (100/sum(60,80,100))
01/02/2012 w_avg = 0.5 * ( 100/ sum(100,80)) + 1.0 * ( 80/ sum(100,80))
采纳答案by Andy Hayden
I think I would do this with two groupbys.
我想我会用两个 groupbys 来做到这一点。
First to calculate the "weighted average":
首先计算“加权平均值”:
In [11]: g = df.groupby('Date')
In [12]: df.value / g.value.transform("sum") * df.wt
Out[12]:
0 0.125000
1 0.250000
2 0.416667
3 0.277778
4 0.444444
dtype: float64
If you set this as a column, you can groupby over it:
如果将其设置为列,则可以对其进行分组:
In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt
Now the sum of this column is the desired:
现在该列的总和是所需的:
In [14]: g.wa.sum()
Out[14]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
Name: wa, dtype: float64
or potentially:
或可能:
In [15]: g.wa.transform("sum")
Out[15]:
0 0.791667
1 0.791667
2 0.791667
3 0.722222
4 0.722222
Name: wa, dtype: float64
回答by kadee
Let's first create the example pandas dataframe:
让我们首先创建示例熊猫数据框:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date')
In [4]: df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index)
Then, the average of 'wt' weighted by 'value' and grouped by the index is obtained as:
然后,得到由 'value' 加权并按指数分组的 'wt' 的平均值为:
In [5]: df.groupby(df.index).apply(lambda x: np.average(x.wt, weights=x.value))
Out[5]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
dtype: float64
Alternatively, one can also define a function:
或者,也可以定义一个函数:
In [5]: def grouped_weighted_avg(values, weights, by):
...: return (values * weights).groupby(by).sum() / weights.groupby(by).sum()
In [6]: grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index)
Out[6]:
Date
01/01/2012 0.791667
01/02/2012 0.722222
dtype: float64
回答by Anish Sugathan
I feel the following is an elegant solution to this problem from:(Pandas DataFrame aggregate function using multiple columns)
我觉得以下是这个问题的优雅解决方案:(使用多列的 Pandas DataFrame 聚合函数)
grouped = df.groupby('Date')
def wavg(group):
d = group['value']
w = group['wt']
return (d * w).sum() / w.sum()
grouped.apply(wavg)
回答by user15051990
I saved the table in the .csv file
我将表格保存在 .csv 文件中
df=pd.read_csv('book1.csv')
grouped=df.groupby('Date')
g_wavg= lambda x: np.average(x.wt, weights=x.value)
grouped.apply(g_wavg)

