pandas 熊猫:分组和聚合而不会丢失被分组的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39441484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: groupby and aggregate without losing the column which was grouped
提问by Fizi
I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.
我有一个Pandas数据框,如下所示。对于每个 ID,我可以有多个名称和子 ID。
Id NAME SUB_ID
276956 A 5933
276956 B 5934
276956 C 5935
287266 D 1589
I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row
我想压缩数据框,以便每个 id 只有一行,并且每个 id 下的所有名称和 sub_id 在该行上显示为单数集
Id NAME SUB_ID
276956 set(A,B,C) set(5933,5934,5935)
287266 set(D) set(1589)
I tried to groupby id and then aggregate over all the other columns
我尝试对 id 进行分组,然后聚合所有其他列
df.groupby('Id').agg(lambda x: set(x))
But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.
但是这样做得到的数据帧没有 Id 列。当您执行 groupby 时,id 作为元组的第一个值返回,但我想当您聚合时会丢失。有没有办法获得我正在寻找的数据框。即在不丢失已分组的列的情况下进行分组和聚合。
回答by Boud
If you don't want the groupby as an index, there is an argument for it to avoid further reset:
如果您不希望 groupby 作为索引,则有一个论据可以避免进一步重置:
df.groupby('Id', as_index=False).agg(lambda x: set(x))
回答by chrisaycock
The groupby column becomes the index. You can simply reset the index to get it back:
groupby 列成为索引。您可以简单地重置索引以将其恢复:
In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index()
Out[4]:
Id NAME SUB_ID
0 276956 {A, C, B} {5933, 5934, 5935}
1 287266 {D} {1589}