pandas 熊猫分组差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48347497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:04:35  来源:igfitidea点击:

Pandas groupby diff

pythonpandasgroup-by

提问by Craig

So my dataframe looks like this:

所以我的数据框看起来像这样:

from pandas.compat import StringIO
d = StringIO('''
date,site,country,score
2018-01-01,google,us,100
2018-01-01,google,ch,50
2018-01-02,google,us,70
2018-01-03,google,us,60
2018-01-02,google,ch,10
2018-01-01,fb,us,50
2018-01-02,fb,us,55
2018-01-03,fb,us,100
2018-01-01,fb,es,100
2018-01-02,fb,gb,100
''')

df = pd.read_csv(d, sep=",")

Each site has a different score depending on the country. I'm trying to find the 1/3/5 day difference of scores for each site/country combination.

每个站点都有不同的分数,具体取决于国家/地区。我试图找到每个站点/国家组合的 1/3/5 天分数差异。

Output should be:

输出应该是:

date,site,country,score,1_day_diff
2018-01-01,google,ch,50,0
2018-01-02,google,ch,10,-40
2018-01-01,google,us,100,0
2018-01-02,google,us,70,-30
2018-01-03,google,us,60,-10
2018-01-01,fb,es,100,0
2018-01-02,fb,gb,100,0
2018-01-01,fb,us,50,0
2018-01-02,fb,us,55,5
2018-01-03,fb,us,100,45

I first tried sorting by site/country/date, then grouping by site and country but I'm not able to wrap my head around getting a difference from a grouped object.

我首先尝试按站点/国家/地区/日期排序,然后按站点和国家/地区分组,但我无法从分组对象中获得差异。

回答by ayhan

First, sort the DataFrame and then all you need is groupby.diff():

首先,对 DataFrame 进行排序,然后您只需要groupby.diff()

df = df.sort_values(by=['site', 'country', 'date'])

df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)

df
Out: 
         date    site country  score  diff
8  2018-01-01      fb      es    100   0.0
9  2018-01-02      fb      gb    100   0.0
5  2018-01-01      fb      us     50   0.0
6  2018-01-02      fb      us     55   5.0
7  2018-01-03      fb      us    100  45.0
1  2018-01-01  google      ch     50   0.0
4  2018-01-02  google      ch     10 -40.0
0  2018-01-01  google      us    100   0.0
2  2018-01-02  google      us     70 -30.0
3  2018-01-03  google      us     60 -10.0

sort_valuesdoesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.

sort_values不支持任意排序。如果您需要任意排序(例如在 fb 之前使用 google),您需要将它们存储在一个集合中并将您的列设置为分类。然后 sort_values 将尊重您在那里提供的排序。