将组 ID 取回 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15072626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:40:32  来源:igfitidea点击:

Get group id back into pandas dataframe

pythonpandasgroup-by

提问by beardc

For dataframe

对于数据框

In [2]: df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
   ...:                    'Rank': np.random.randint(0,3,6),
   ...:                    'Val': np.random.rand(6)})
   ...: df
Out[2]: 
  Name  Rank       Val
0  foo     0  0.299397
1  bar     0  0.909228
2  foo     0  0.517700
3  bar     0  0.929863
4  foo     1  0.209324
5  bar     2  0.381515

I'm interested in grouping by Name and Rank and possibly getting aggregate values

我有兴趣按名称和等级分组并可能获得聚合值

In [3]: group = df.groupby(['Name', 'Rank'])
In [4]: agg = group.agg(sum)
In [5]: agg
Out[5]: 
                Val
Name Rank          
bar  0     1.839091
     2     0.381515
foo  0     0.817097
     1     0.209324

But I would like to get a field in the original dfthat contains the group number for that row, like

但是我想在原始字段中获取一个df包含该行组号的字段,例如

In [13]: df['Group_id'] = [2, 0, 2, 0, 3, 1]
In [14]: df
Out[14]: 
  Name  Rank       Val  Group_id
0  foo     0  0.299397         2
1  bar     0  0.909228         0
2  foo     0  0.517700         2
3  bar     0  0.929863         0
4  foo     1  0.209324         3
5  bar     2  0.381515         1

Is there a good way to do this in pandas?

大Pandas有什么好方法可以做到这一点吗?

I can get it with python,

我可以用python得到它,

In [16]: from itertools import count
In [17]: c = count()
In [22]: group.transform(lambda x: c.next())
Out[22]: 
   Val
0    2
1    0
2    2
3    0
4    3
5    1

but it's pretty slow on a large dataframe, so I figured there may be a better built in pandas way to do this.

但是在大型数据帧上它很慢,所以我认为可能有更好的内置Pandas方式来做到这一点。

回答by DSM

A lot of handy things are stored in the DataFrameGroupBy.grouperobject. For example:

很多方便的东西都存储在DataFrameGroupBy.grouper对象中。例如:

>>> df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
                   'Rank': np.random.randint(0,3,6),
                   'Val': np.random.rand(6)})
>>> grouped = df.groupby(["Name", "Rank"])
>>> grouped.grouper.
grouped.grouper.agg_series        grouped.grouper.indices
grouped.grouper.aggregate         grouped.grouper.labels
grouped.grouper.apply             grouped.grouper.levels
grouped.grouper.axis              grouped.grouper.names
grouped.grouper.compressed        grouped.grouper.ngroups
grouped.grouper.get_group_levels  grouped.grouper.nkeys
grouped.grouper.get_iterator      grouped.grouper.result_index
grouped.grouper.group_info        grouped.grouper.shape
grouped.grouper.group_keys        grouped.grouper.size
grouped.grouper.groupings         grouped.grouper.sort
grouped.grouper.groups            

and so:

所以:

>>> df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]
>>> df
  Name  Rank       Val  GroupId
0  foo     0  0.302482        2
1  bar     0  0.375193        0
2  foo     2  0.965763        4
3  bar     2  0.166417        1
4  foo     1  0.495124        3
5  bar     2  0.728776        1

There may be a nicer alias for for grouper.group_info[0]lurking around somewhere, but this should work, anyway.

grouper.group_info[0]潜伏在某处可能有一个更好的别名,但无论如何这应该有效。

回答by jezrael

Use GroupBy.ngroupfrom pandas 0.20.2+:

GroupBy.ngroup从Pandas 0.20.2+使用:

df["GroupId"] = df.groupby(["Name", "Rank"]).ngroup()
print (df)
  Name  Rank       Val  GroupId
0  foo     2  0.451724        4
1  bar     0  0.944676        0
2  foo     0  0.822390        2
3  bar     2  0.063603        1
4  foo     1  0.938892        3
5  bar     2  0.332454        1

回答by Luca Pappalardo

The correct solution is to use grouper.label_info:

正确的解决方案是使用grouper.label_info

df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.label_info

It automatically associates each row in the dfdataframe to the corresponding group label.

它会自动将df数据框中的每一行与相应的组标签相关联。