pandas 熊猫:分组和聚合而不会丢失被分组的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39441484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:59:30  来源:igfitidea点击:

pandas: groupby and aggregate without losing the column which was grouped

pythonpandasdataframegroup-by

提问by Fizi

I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.

我有一个Pandas数据框,如下所示。对于每个 ID,我可以有多个名称和子 ID。

Id      NAME   SUB_ID
276956  A      5933
276956  B      5934
276956  C      5935
287266  D      1589

I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row

我想压缩数据框,以便每个 id 只有一行,并且每个 id 下的所有名称和 sub_id 在该行上显示为单数集

Id      NAME           SUB_ID
276956  set(A,B,C)     set(5933,5934,5935)
287266  set(D)         set(1589) 

I tried to groupby id and then aggregate over all the other columns

我尝试对 id 进行分组,然后聚合所有其他列

df.groupby('Id').agg(lambda x: set(x))

But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.

但是这样做得到的数据帧没有 Id 列。当您执行 groupby 时,id 作为元组的第一个值返回,但我想当您聚合时会丢失。有没有办法获得我正在寻找的数据框。即在不丢失已分组的列的情况下进行分组和聚合。

回答by Boud

If you don't want the groupby as an index, there is an argument for it to avoid further reset:

如果您不希望 groupby 作为索引,则有一个论据可以避免进一步重置:

df.groupby('Id', as_index=False).agg(lambda x: set(x))

回答by chrisaycock

The groupby column becomes the index. You can simply reset the index to get it back:

groupby 列成为索引。您可以简单地重置索引以将其恢复:

In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index()
Out[4]: 
       Id       NAME              SUB_ID
0  276956  {A, C, B}  {5933, 5934, 5935}
1  287266        {D}              {1589}