Groupby Pandas 级别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34710267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Groupby Pandas levels
提问by noblerthanoedipus
Similar to my previous question, I want to split a dataframe by groupby and apply a calculation.
与我之前的问题类似,我想按 groupby 拆分数据帧并应用计算。
Now I want to introduce a new column to split the calculation over the dataframe. Here is the code:
现在我想引入一个新列来拆分数据帧上的计算。这是代码:
import pandas as pd
import numpy as np
d = {'year' : [2000, 2000, 2000, 2000, 2001, 2001, 2001],
'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],
'aw': [1, 0, 0, 0, 1, 0, np.nan],
'hw': [0, 1, 0, 1, 0, 1, np.nan]}
df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'
df = df.set_index(['hw', 'aw'], append=True).stack().reset_index().rename(columns={'level_3': 'role', 0: 'team'}).loc[:,
['game', 'team', 'role', 'hw', 'aw']]
def wins(row):
if row['role'] == 'home':
return row['hw']
else:
return row['aw']
df['wins'] = df.apply(wins, axis=1)
df['expanding_mean'] = df.groupby('team')['wins'].apply(lambda x: pd.expanding_mean(x).shift())
print df
Running the above will give the expanding mean over the entire dataframe. But how do I re-start the calculation for each new year
?
运行上面的将给出整个数据帧的扩展平均值。但是我如何重新开始每个新的计算year
?
I have tried adding year
to columns= in the df declaration but it is included in role
which is not desired. My gap in understanding is in the levels so any enlightenment appreciated.
我曾尝试year
在 df 声明中添加columns= ,但它包含在role
其中是不需要的。我的理解差距在于水平,因此任何启蒙都值得赞赏。
Edit: desired result below
编辑:下面想要的结果
game team role hw aw wins expanding_mean year
0 1 A home 0 1 0 NaN 2000
1 1 B away 0 1 1 NaN 2000
2 2 B home 1 0 1 1.000000 2000
3 2 A away 1 0 0 0.000000 2000
4 3 B home 0 0 0 1.000000 2000
5 3 A away 0 0 0 0.000000 2000
6 4 A home 1 0 1 0.000000 2000
7 4 B away 1 0 0 0.666667 2000
8 5 B home 0 1 0 NaN 2001
9 5 A away 0 1 1 NaN 2001
10 6 A home 1 0 1 0.000000 2001
11 6 B away 1 0 0 1.000000 2001
12 7 A home NaN NaN NaN 0.500000 2001
13 7 B away NaN NaN NaN 0.500000 2001
回答by jezrael
You can add year
to df.groupby(['team', 'year'])
and add column year
in code above groupby
with changing level_3
to level_4
in function rename
, because column year
was added to index:
您可以添加year
到df.groupby(['team', 'year'])
并添加列year
在上面的代码groupby
与改变level_3
,以level_4
在功能上rename
,因为列year
添加到索引:
import pandas as pd
import numpy as np
d = {'year' : [2000, 2000, 2000, 2000, 2001, 2001, 2001],
'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],
'aw': [1, 0, 0, 0, 1, 0, np.nan],
'hw': [0, 1, 0, 1, 0, 1, np.nan]}
df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw', 'year'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'
df = df.set_index(['hw', 'aw', 'year'], append=True).stack().reset_index().rename(columns={'level_4': 'role', 0: 'team'}).loc[:,
['game', 'team', 'role', 'hw', 'aw', 'year']]
def wins(row):
if row['role'] == 'home':
return row['hw']
else:
return row['aw']
df['wins'] = df.apply(wins, axis=1)
df['expanding_mean'] = df.groupby(['team', 'year'])['wins'].apply(lambda x: pd.expanding_mean(x).shift())
print df
game team role hw aw year wins expanding_mean
0 1 A home 0 1 2000 0 NaN
1 1 B away 0 1 2000 1 NaN
2 2 B home 1 0 2000 1 1.000000
3 2 A away 1 0 2000 0 0.000000
4 3 B home 0 0 2000 0 1.000000
5 3 A away 0 0 2000 0 0.000000
6 4 A home 1 0 2000 1 0.000000
7 4 B away 1 0 2000 0 0.666667
8 5 B home 0 1 2001 0 NaN
9 5 A away 0 1 2001 1 NaN
10 6 A home 1 0 2001 1 1.000000
11 6 B away 1 0 2001 0 0.000000
12 7 A home NaN NaN 2001 NaN 1.000000
13 7 B away NaN NaN 2001 NaN 0.000000
回答by MaxNoe
groupby
both year
and team
and use transform
:
groupby
既year
与team
和使用transform
:
import pandas as pd
import numpy as np
d = {
'year': [2000, 2000, 2000, 2000, 2001, 2001, 2001],
'team': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
'value': [1, 0, 0, 1, 2, 3, 3],
}
df = pd.DataFrame(d)
df['mean_per_team_and_year'] = df.groupby(['team', 'year']).transform('mean')
print(df)