Groupby Pandas 级别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34710267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:29:25  来源:igfitidea点击:

Groupby Pandas levels

pythonpandas

提问by noblerthanoedipus

Similar to my previous question, I want to split a dataframe by groupby and apply a calculation.

与我之前的问题类似,我想按 groupby 拆分数据帧并应用计算。

Now I want to introduce a new column to split the calculation over the dataframe. Here is the code:

现在我想引入一个新列来拆分数据帧上的计算。这是代码:

import pandas as pd
import numpy as np

d = {'year' : [2000, 2000, 2000, 2000, 2001, 2001, 2001],
 'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
 'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],
 'aw': [1, 0, 0, 0, 1, 0, np.nan],
 'hw': [0, 1, 0, 1, 0, 1, np.nan]}

df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'

df = df.set_index(['hw', 'aw'], append=True).stack().reset_index().rename(columns={'level_3': 'role', 0: 'team'}).loc[:,
 ['game', 'team', 'role', 'hw', 'aw']]

def wins(row):
    if row['role'] == 'home':
        return row['hw']
    else:
        return row['aw']
df['wins'] = df.apply(wins, axis=1)

df['expanding_mean'] = df.groupby('team')['wins'].apply(lambda x: pd.expanding_mean(x).shift())

print df

Running the above will give the expanding mean over the entire dataframe. But how do I re-start the calculation for each new year?

运行上面的将给出整个数据帧的扩展平均值。但是我如何重新开始每个新的计算year

I have tried adding yearto columns= in the df declaration but it is included in rolewhich is not desired. My gap in understanding is in the levels so any enlightenment appreciated.

我曾尝试year在 df 声明中添加columns= ,但它包含在role其中是不需要的。我的理解差距在于水平,因此任何启蒙都值得赞赏。

Edit: desired result below

编辑:下面想要的结果

    game team  role  hw  aw  wins  expanding_mean    year
0      1    A  home   0   1     0             NaN    2000
1      1    B  away   0   1     1             NaN    2000
2      2    B  home   1   0     1        1.000000    2000
3      2    A  away   1   0     0        0.000000    2000
4      3    B  home   0   0     0        1.000000    2000
5      3    A  away   0   0     0        0.000000    2000
6      4    A  home   1   0     1        0.000000    2000
7      4    B  away   1   0     0        0.666667    2000
8      5    B  home   0   1     0             NaN    2001
9      5    A  away   0   1     1             NaN    2001
10     6    A  home   1   0     1        0.000000    2001
11     6    B  away   1   0     0        1.000000    2001
12     7    A  home NaN NaN   NaN        0.500000    2001
13     7    B  away NaN NaN   NaN        0.500000    2001

回答by jezrael

You can add yearto df.groupby(['team', 'year'])and add column yearin code above groupbywith changing level_3to level_4in function rename, because column yearwas added to index:

您可以添加yeardf.groupby(['team', 'year'])并添加列year在上面的代码groupby与改变level_3,以level_4在功能上rename,因为列year添加到索引:

import pandas as pd
import numpy as np

d = {'year' : [2000, 2000, 2000, 2000, 2001, 2001, 2001],
 'home': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
 'away': ['B', 'A', 'A', 'B', 'A', 'B', 'B'],
 'aw': [1, 0, 0, 0, 1, 0, np.nan],
 'hw': [0, 1, 0, 1, 0, 1, np.nan]}

df = pd.DataFrame(d, columns=['home', 'away', 'hw', 'aw', 'year'])
df.index = range(1, len(df) + 1)
df.index.name = 'game'

df = df.set_index(['hw', 'aw', 'year'], append=True).stack().reset_index().rename(columns={'level_4': 'role', 0: 'team'}).loc[:,
 ['game', 'team', 'role', 'hw', 'aw', 'year']]

def wins(row):
    if row['role'] == 'home':
        return row['hw']
    else:
        return row['aw']
df['wins'] = df.apply(wins, axis=1)

df['expanding_mean'] = df.groupby(['team', 'year'])['wins'].apply(lambda x: pd.expanding_mean(x).shift())
print df

    game team  role  hw  aw  year  wins  expanding_mean
0      1    A  home   0   1  2000     0             NaN
1      1    B  away   0   1  2000     1             NaN
2      2    B  home   1   0  2000     1        1.000000
3      2    A  away   1   0  2000     0        0.000000
4      3    B  home   0   0  2000     0        1.000000
5      3    A  away   0   0  2000     0        0.000000
6      4    A  home   1   0  2000     1        0.000000
7      4    B  away   1   0  2000     0        0.666667
8      5    B  home   0   1  2001     0             NaN
9      5    A  away   0   1  2001     1             NaN
10     6    A  home   1   0  2001     1        1.000000
11     6    B  away   1   0  2001     0        0.000000
12     7    A  home NaN NaN  2001   NaN        1.000000
13     7    B  away NaN NaN  2001   NaN        0.000000

回答by MaxNoe

groupbyboth yearand teamand use transform:

groupbyyearteam和使用transform

import pandas as pd
import numpy as np


d = {
    'year': [2000, 2000, 2000, 2000, 2001, 2001, 2001],
    'team': ['A', 'B', 'B', 'A', 'B', 'A', 'A'],
    'value': [1, 0, 0, 1, 2, 3, 3],
}

df = pd.DataFrame(d)

df['mean_per_team_and_year'] = df.groupby(['team', 'year']).transform('mean')
print(df)