Python 基于groupby拆分pandas数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23691133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split pandas dataframe based on groupby
提问by user308827
I want to split the following dataframe based on column ZZ
我想根据列 ZZ 拆分以下数据框
df =
N0_YLDF ZZ MAT
0 6.286333 2 11.669069
1 6.317000 6 11.669069
2 6.324889 6 11.516454
3 6.320667 5 11.516454
4 6.325556 5 11.516454
5 6.359000 6 11.516454
6 6.359000 6 11.516454
7 6.361111 7 11.516454
8 6.360778 7 11.516454
9 6.361111 6 11.516454
As output, I want a new dataframe with the 'N0_YLDF' column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.
作为输出,我想要一个将“N0_YLDF”列拆分为 4 个的新数据框,ZZ 的每个唯一值对应一个新列。我该怎么做?我可以进行分组,但不知道如何处理分组对象。
采纳答案by qwwqwwq
gb = df.groupby('ZZ')
[gb.get_group(x) for x in gb.groups]
回答by Jeff Mandell
In R there is a dataframe method called split. This is for all the R users out there:
在 R 中有一个名为 split 的数据帧方法。这适用于所有 R 用户:
def split(df, group):
gb = df.groupby(group)
return [gb.get_group(x) for x in gb.groups]
回答by Anton vBR
There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).
还有另一种选择,因为 groupby 返回一个生成器,我们可以简单地使用列表理解来检索第二个值(框架)。
dfs = [x for _, x in df.groupby('ZZ')]
回答by ALollz
Store them in a dict
, which allows you access to the group DataFrames based on the group keys.
将它们存储在一个 中dict
,这样您就可以根据组键访问组数据帧。
d = dict(tuple(df.groupby('ZZ')))
d[6]
# N0_YLDF ZZ MAT
#1 6.317000 6 11.669069
#2 6.324889 6 11.516454
#5 6.359000 6 11.516454
#6 6.359000 6 11.516454
#9 6.361111 6 11.516454
If you need only a subset of the DataFrame, in this case just the 'NO_YLDF'
Series, you can modify the dict comprehension.
如果您只需要 DataFrame 的一个子集,在这种情况下只需要'NO_YLDF'
系列,您可以修改 dict 理解。
d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1 6.317000
#2 6.324889
#5 6.359000
#6 6.359000
#9 6.361111
#Name: N0_YLDF, dtype: float64