分区上的聚合 - pandas Dataframe
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35905335/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Aggregation over Partition - pandas Dataframe
提问by Ivan KR
I am looking for the best way to aggregate values based on a particular partition , an equivalent of
我正在寻找基于特定分区聚合值的最佳方法,相当于
SUM(TotalCost) OVER(PARTITION BY ShopName) Earnings ( SQL server)
I am able to do this by the following steps in Pandas , but looking for a native approach which I am sure should exist
我可以通过 Pandas 中的以下步骤来做到这一点,但正在寻找一种我确信应该存在的本机方法
TempDF= DF.groupby(by=['ShopName'])['TotalCost'].sum()
TempDF= TempDF.reset_index()
NewDF=pd.merge(DF , TempDF, how='inner', on='ShopName')
Thanks a lot for reading through !
非常感谢您通读!
回答by Anton Kargapolov
You can use pandas transform() method for within group aggregations like "OVER(partition by ...)" in SQL:
您可以在 SQL 中将 pandas transform() 方法用于组内聚合,例如“OVER(partition by ...)”:
import pandas as pd
import numpy as np
#create dataframe with sample data
df = pd.DataFrame({'group':['A','A','A','B','B','B'],'value':[1,2,3,4,5,6]})
#calculate AVG(value) OVER (PARTITION BY group)
df['mean_value'] = df.groupby('group').value.transform(np.mean)
df:
group value mean_value
A 1 2
A 2 2
A 3 2
B 4 5
B 5 5
B 6 5