将组 ID 取回 Pandas 数据框

Question

提问by beardc

For dataframe

对于数据框

In [2]: df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
   ...:                    'Rank': np.random.randint(0,3,6),
   ...:                    'Val': np.random.rand(6)})
   ...: df
Out[2]: 
  Name  Rank       Val
0  foo     0  0.299397
1  bar     0  0.909228
2  foo     0  0.517700
3  bar     0  0.929863
4  foo     1  0.209324
5  bar     2  0.381515

I'm interested in grouping by Name and Rank and possibly getting aggregate values

我有兴趣按名称和等级分组并可能获得聚合值

In [3]: group = df.groupby(['Name', 'Rank'])
In [4]: agg = group.agg(sum)
In [5]: agg
Out[5]: 
                Val
Name Rank          
bar  0     1.839091
     2     0.381515
foo  0     0.817097
     1     0.209324

But I would like to get a field in the original dfthat contains the group number for that row, like

但是我想在原始字段中获取一个df包含该行组号的字段，例如

In [13]: df['Group_id'] = [2, 0, 2, 0, 3, 1]
In [14]: df
Out[14]: 
  Name  Rank       Val  Group_id
0  foo     0  0.299397         2
1  bar     0  0.909228         0
2  foo     0  0.517700         2
3  bar     0  0.929863         0
4  foo     1  0.209324         3
5  bar     2  0.381515         1

Is there a good way to do this in pandas?

大Pandas有什么好方法可以做到这一点吗？

I can get it with python,

我可以用python得到它，

In [16]: from itertools import count
In [17]: c = count()
In [22]: group.transform(lambda x: c.next())
Out[22]: 
   Val
0    2
1    0
2    2
3    0
4    3
5    1

but it's pretty slow on a large dataframe, so I figured there may be a better built in pandas way to do this.

但是在大型数据帧上它很慢，所以我认为可能有更好的内置Pandas方式来做到这一点。

Answer 1

回答by DSM

A lot of handy things are stored in the DataFrameGroupBy.grouperobject. For example:

很多方便的东西都存储在DataFrameGroupBy.grouper对象中。例如：

>>> df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
                   'Rank': np.random.randint(0,3,6),
                   'Val': np.random.rand(6)})
>>> grouped = df.groupby(["Name", "Rank"])
>>> grouped.grouper.
grouped.grouper.agg_series        grouped.grouper.indices
grouped.grouper.aggregate         grouped.grouper.labels
grouped.grouper.apply             grouped.grouper.levels
grouped.grouper.axis              grouped.grouper.names
grouped.grouper.compressed        grouped.grouper.ngroups
grouped.grouper.get_group_levels  grouped.grouper.nkeys
grouped.grouper.get_iterator      grouped.grouper.result_index
grouped.grouper.group_info        grouped.grouper.shape
grouped.grouper.group_keys        grouped.grouper.size
grouped.grouper.groupings         grouped.grouper.sort
grouped.grouper.groups

and so:

所以：

>>> df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]
>>> df
  Name  Rank       Val  GroupId
0  foo     0  0.302482        2
1  bar     0  0.375193        0
2  foo     2  0.965763        4
3  bar     2  0.166417        1
4  foo     1  0.495124        3
5  bar     2  0.728776        1

There may be a nicer alias for for grouper.group_info[0]lurking around somewhere, but this should work, anyway.

grouper.group_info[0]潜伏在某处可能有一个更好的别名，但无论如何这应该有效。

Answer 2

回答by jezrael

Use GroupBy.ngroupfrom pandas 0.20.2+:

GroupBy.ngroup从Pandas 0.20.2+使用：

df["GroupId"] = df.groupby(["Name", "Rank"]).ngroup()
print (df)
  Name  Rank       Val  GroupId
0  foo     2  0.451724        4
1  bar     0  0.944676        0
2  foo     0  0.822390        2
3  bar     2  0.063603        1
4  foo     1  0.938892        3
5  bar     2  0.332454        1

Answer 3

回答by Luca Pappalardo

The correct solution is to use grouper.label_info:

正确的解决方案是使用grouper.label_info：

df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.label_info

It automatically associates each row in the dfdataframe to the corresponding group label.

它会自动将df数据框中的每一行与相应的组标签相关联。

将组 ID 取回 Pandas 数据框

提问by beardc

回答by DSM

回答by jezrael

回答by Luca Pappalardo

相关推荐

最近更新

标签

将组 ID 取回 Pandas 数据框

提问by beardc

回答by DSM

回答by jezrael

回答by Luca Pappalardo

相关推荐

Mac OS X 上的 Pandas 安装：ImportError（无法导入名称哈希表）

pandas python2和python3之间的pandas.DataFrame.load/save：pickle协议问题

Pandas DataFrame 到 SQLite

Pandas 数据帧在每第 n 行重新采样

相关推荐

最近更新

标签