Python 基于groupby拆分pandas数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23691133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:20:03  来源:igfitidea点击:

Split pandas dataframe based on groupby

pythonpandas

提问by user308827

I want to split the following dataframe based on column ZZ

我想根据列 ZZ 拆分以下数据框

df = 
        N0_YLDF  ZZ        MAT
    0  6.286333   2  11.669069
    1  6.317000   6  11.669069
    2  6.324889   6  11.516454
    3  6.320667   5  11.516454
    4  6.325556   5  11.516454
    5  6.359000   6  11.516454
    6  6.359000   6  11.516454
    7  6.361111   7  11.516454
    8  6.360778   7  11.516454
    9  6.361111   6  11.516454

As output, I want a new dataframe with the 'N0_YLDF' column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.

作为输出,我想要一个将“N0_YLDF”列拆分为 4 个的新数据框,ZZ 的每个唯一值对应一个新列。我该怎么做?我可以进行分组,但不知道如何处理分组对象。

采纳答案by qwwqwwq

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

回答by Jeff Mandell

In R there is a dataframe method called split. This is for all the R users out there:

在 R 中有一个名为 split 的数据帧方法。这适用于所有 R 用户:

def split(df, group):
     gb = df.groupby(group)
     return [gb.get_group(x) for x in gb.groups]

回答by Anton vBR

There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).

还有另一种选择,因为 groupby 返回一个生成器,我们可以简单地使用列表理解来检索第二个值(框架)。

dfs = [x for _, x in df.groupby('ZZ')]

回答by ALollz

Store them in a dict, which allows you access to the group DataFrames based on the group keys.

将它们存储在一个 中dict,这样您就可以根据组键访问组数据帧。

d = dict(tuple(df.groupby('ZZ')))
d[6]

#    N0_YLDF  ZZ        MAT
#1  6.317000   6  11.669069
#2  6.324889   6  11.516454
#5  6.359000   6  11.516454
#6  6.359000   6  11.516454
#9  6.361111   6  11.516454

If you need only a subset of the DataFrame, in this case just the 'NO_YLDF'Series, you can modify the dict comprehension.

如果您只需要 DataFrame 的一个子集,在这种情况下只需要'NO_YLDF'系列,您可以修改 dict 理解。

d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1    6.317000
#2    6.324889
#5    6.359000
#6    6.359000
#9    6.361111
#Name: N0_YLDF, dtype: float64