pandas 熊猫：分组和聚合而不会丢失被分组的列

Question

提问by Fizi

I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.

我有一个Pandas数据框，如下所示。对于每个 ID，我可以有多个名称和子 ID。

Id      NAME   SUB_ID
276956  A      5933
276956  B      5934
276956  C      5935
287266  D      1589

I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row

我想压缩数据框，以便每个 id 只有一行，并且每个 id 下的所有名称和 sub_id 在该行上显示为单数集

Id      NAME           SUB_ID
276956  set(A,B,C)     set(5933,5934,5935)
287266  set(D)         set(1589)

I tried to groupby id and then aggregate over all the other columns

我尝试对 id 进行分组，然后聚合所有其他列

df.groupby('Id').agg(lambda x: set(x))

But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.

但是这样做得到的数据帧没有 Id 列。当您执行 groupby 时，id 作为元组的第一个值返回，但我想当您聚合时会丢失。有没有办法获得我正在寻找的数据框。即在不丢失已分组的列的情况下进行分组和聚合。

Answer 1

回答by Boud

If you don't want the groupby as an index, there is an argument for it to avoid further reset:

如果您不希望 groupby 作为索引，则有一个论据可以避免进一步重置：

df.groupby('Id', as_index=False).agg(lambda x: set(x))

Answer 2

回答by chrisaycock

The groupby column becomes the index. You can simply reset the index to get it back:

groupby 列成为索引。您可以简单地重置索引以将其恢复：

In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index()
Out[4]: 
       Id       NAME              SUB_ID
0  276956  {A, C, B}  {5933, 5934, 5935}
1  287266        {D}              {1589}

pandas 熊猫：分组和聚合而不会丢失被分组的列

提问by Fizi

回答by Boud

回答by chrisaycock

相关推荐

最近更新

标签

pandas 熊猫：分组和聚合而不会丢失被分组的列

提问by Fizi

回答by Boud

回答by chrisaycock

相关推荐

pandas 根据对象的类型（即 str ）从 DataFrame 中选择行

Python：使用给定的列为带有 x 轴的 Pandas 数据框绘制条形图

在 Pandas DataFrame 中对空值使用 None 而不是 np.nan

pandas groupby、sum 和 count 到一张表

相关推荐

最近更新

标签