Python 在熊猫中是否有与 .groupby 相对的“取消分组”操作？

Question

提问by mkln

Suppose we take a pandas dataframe...

假设我们采用熊猫数据框...

    name  age  family
0   john    1       1
1  jason   36       1
2   jane   32       1
3   Hyman   26       2
4  james   30       2

Then do a groupby()...

然后做一个groupby()...

group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean})

Then do some aggregate/summarize operation (in my example, my function name_joinaggregates the names):

然后进行一些聚合/汇总操作（在我的示例中，我的函数name_join聚合名称）：

def name_join(list_names, concat='-'):
    return concat.join(list_names)

The grouped summarized output is thus:

因此，分组汇总输出为：

        age             name
family                      
1        23  john-jason-jane
2        28       Hyman-james

Question:

题：

Is there a quick, efficient way to get to the following from the aggregated table?

是否有一种快速有效的方法可以从聚合表中获取以下内容？

    name  age  family
0   john   23       1
1  jason   23       1
2   jane   23       1
3   Hyman   28       2
4  james   28       2

(Note: the agecolumn values are just examples, I don't care for the information I am losing after averaging in this specific example)

（注意：age列值只是示例，我不在乎在此特定示例中求平均值后丢失的信息）

The way I thought I could do it does not look too efficient:

我认为我可以做到的方式看起来不太有效：

create empty dataframe
from every line in group_df, separate the names
return a dataframe with as many rows as there are names in the starting row
append the output to the empty dataframe

创建空数据框
从中的每一行group_df，将名称分开
返回一个数据框，其行数与起始行中的名称一样多
将输出附加到空数据帧

Answer 1

采纳答案by Dan Allan

The rough equivalent is .reset_index(), but it may not be helpful to think of it as the "opposite" of groupby().

粗略的等价物是.reset_index()，但将其视为的“对立面”可能没有帮助groupby()。

You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of minedoes the job.

您正在将一个字符串分成几部分，并保持每个部分与“家庭”的关联。我的这个旧答案可以解决问题。

Just set 'family' as the index column first, refer to the link above, and then reset_index()at the end to get your desired result.

只需先将'family'设置为索引列，参考上面的链接，然后reset_index()在最后得到你想要的结果。

Answer 2

回答by xuancong84

There are a few ways to undo DataFrame.groupby, one way is to do DataFrame.groupby.filter(lambda x:True), this gets back to the original DataFrame.

有几种方法可以撤销DataFrame.groupby，一种方法是做DataFrame.groupby.filter(lambda x:True)，这样就回到了原来的DataFrame。

Answer 3

回答by Skysail

Here's a complete example that recovers the original dataframe from the grouped object

这是从分组对象中恢复原始数据帧的完整示例

def name_join(list_names, concat='-'):
    return concat.join(list_names)

print('create dataframe\n')
df = pandas.DataFrame({'name':['john', 'jason', 'jane', 'Hyman', 'james'], 'age':[1,36,32,26,30], 'family':[1,1,1,2,2]})
df.index.name='indexer'
print(df)
print('create group_by object')
group_obj_df = df.groupby('family')
print(group_obj_df)

print('\nrecover grouped df')
group_joined_df = group_obj_df.aggregate({'name': name_join, 'age': 'mean'})
group_joined_df


create dataframe

          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         Hyman   26       2
4        james   30       2
create group_by object
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fbfdd9dd048>

recover grouped df 
                   name  age
family                      
1       john-jason-jane   23
2            Hyman-james   28

print('\nRecover the original dataframe')
print(pandas.concat([group_obj_df.get_group(key) for key in group_obj_df.groups]))

Recover the original dataframe
          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         Hyman   26       2
4        james   30       2

Python 在熊猫中是否有与 .groupby 相对的“取消分组”操作？

提问by mkln

Question:

题：

采纳答案by Dan Allan

回答by xuancong84

回答by Skysail

相关推荐

最近更新

标签

Python 在熊猫中是否有与 .groupby 相对的“取消分组”操作？

提问by mkln

Question:

题：

采纳答案by Dan Allan

回答by xuancong84

回答by Skysail

相关推荐

将 Python Flask 应用程序拆分为多个文件

Python <class 'requests.models.Response'> 到 Json

Python 使用 scikit-learn 的 Imputer 模块预测缺失值

Python + 正则表达式：AttributeError：'NoneType' 对象没有属性 'groups'

相关推荐

最近更新

标签