Python 在熊猫中是否有与 .groupby 相对的“取消分组”操作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20122521/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there an "ungroup by" operation opposite to .groupby in pandas?
提问by mkln
Suppose we take a pandas dataframe...
假设我们采用熊猫数据框...
name age family
0 john 1 1
1 jason 36 1
2 jane 32 1
3 Hyman 26 2
4 james 30 2
Then do a groupby()...
然后做一个groupby()...
group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean})
Then do some aggregate/summarize operation (in my example, my function name_joinaggregates the names):
然后进行一些聚合/汇总操作(在我的示例中,我的函数name_join聚合名称):
def name_join(list_names, concat='-'):
return concat.join(list_names)
The grouped summarized output is thus:
因此,分组汇总输出为:
age name
family
1 23 john-jason-jane
2 28 Hyman-james
Question:
题:
Is there a quick, efficient way to get to the following from the aggregated table?
是否有一种快速有效的方法可以从聚合表中获取以下内容?
name age family
0 john 23 1
1 jason 23 1
2 jane 23 1
3 Hyman 28 2
4 james 28 2
(Note: the agecolumn values are just examples, I don't care for the information I am losing after averaging in this specific example)
(注意:age列值只是示例,我不在乎在此特定示例中求平均值后丢失的信息)
The way I thought I could do it does not look too efficient:
我认为我可以做到的方式看起来不太有效:
- create empty dataframe
- from every line in
group_df, separate the names - return a dataframe with as many rows as there are names in the starting row
- append the output to the empty dataframe
- 创建空数据框
- 从 中的每一行
group_df,将名称分开 - 返回一个数据框,其行数与起始行中的名称一样多
- 将输出附加到空数据帧
采纳答案by Dan Allan
The rough equivalent is .reset_index(), but it may not be helpful to think of it as the "opposite" of groupby().
粗略的等价物是.reset_index(),但将其视为 的“对立面”可能没有帮助groupby()。
You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of minedoes the job.
您正在将一个字符串分成几部分,并保持每个部分与“家庭”的关联。我的这个旧答案可以解决问题。
Just set 'family' as the index column first, refer to the link above, and then reset_index()at the end to get your desired result.
只需先将'family'设置为索引列,参考上面的链接,然后reset_index()在最后得到你想要的结果。
回答by xuancong84
There are a few ways to undo DataFrame.groupby, one way is to do DataFrame.groupby.filter(lambda x:True), this gets back to the original DataFrame.
有几种方法可以撤销DataFrame.groupby,一种方法是做DataFrame.groupby.filter(lambda x:True),这样就回到了原来的DataFrame。
回答by Skysail
Here's a complete example that recovers the original dataframe from the grouped object
这是从分组对象中恢复原始数据帧的完整示例
def name_join(list_names, concat='-'):
return concat.join(list_names)
print('create dataframe\n')
df = pandas.DataFrame({'name':['john', 'jason', 'jane', 'Hyman', 'james'], 'age':[1,36,32,26,30], 'family':[1,1,1,2,2]})
df.index.name='indexer'
print(df)
print('create group_by object')
group_obj_df = df.groupby('family')
print(group_obj_df)
print('\nrecover grouped df')
group_joined_df = group_obj_df.aggregate({'name': name_join, 'age': 'mean'})
group_joined_df
create dataframe
name age family
indexer
0 john 1 1
1 jason 36 1
2 jane 32 1
3 Hyman 26 2
4 james 30 2
create group_by object
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fbfdd9dd048>
recover grouped df
name age
family
1 john-jason-jane 23
2 Hyman-james 28
print('\nRecover the original dataframe')
print(pandas.concat([group_obj_df.get_group(key) for key in group_obj_df.groups]))
Recover the original dataframe
name age family
indexer
0 john 1 1
1 jason 36 1
2 jane 32 1
3 Hyman 26 2
4 james 30 2

