Pandas Groupy 只取前 N 个组

Question

提问by Christian Sauer

I have some DataFrame which I want to group by the ID, e. g.:

我有一些要按 ID 分组的 DataFrame，例如：

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]})
print df

Which generates:

产生：

  item_id  user_id
0       a        1
1       a        2
2       b        1
3       b        1
4       b        3
5       c        1
6       d        5

[7 rows x 2 columns]

I can easily group by the id:

我可以轻松地按 id 分组：

grouped = df.groupby("item_id")

But how can I return only the first N group-by objects? E. g. I want only the first 3 unique item_ids.

但是我怎样才能只返回前 N 个 group-by 对象呢？例如我只想要前 3 个唯一的 item_id。

Answer 1

回答by Jianxun Li

Here is one way using list(grouped).

这是使用list(grouped).

result = [g[1] for g in list(grouped)[:3]]

# 1st
result[0]

  item_id  user_id
0       a        1
1       a        2

# 2nd
result[1]

  item_id  user_id
2       b        1
3       b        1
4       b        3

Answer 2

回答by Alexander

One method is to use Counterto get the top 3 unique items from the list, filter your DataFrame based on those items, and then perform a groupby operation on this filtered DataFrame.

一种方法是使用Counter从列表中获取前 3 个唯一项，根据这些项过滤您的 DataFrame，然后对这个过滤后的 DataFrame 执行 groupby 操作。

from collections import Counter

c = Counter(df.item_id)
most_common = [item for item, _ in c.most_common(3)]

>>> df[df.item_id.isin(most_common)].groupby('item_id').sum()
         user_id
item_id         
a              3
b              5
c              1

Pandas Groupy 只取前 N 个组

提问by Christian Sauer

回答by Jianxun Li

回答by Alexander

相关推荐

最近更新

标签

Pandas Groupy 只取前 N 个组

提问by Christian Sauer

回答by Jianxun Li

回答by Alexander

相关推荐

Python Pandas：将嵌套字典转换为数据框

pandas 将空列表列添加到 DataFrame

pandas 计算熊猫数据框中每一行的百分比

pandas 在python中将查询结果转换为DataFrame

相关推荐

最近更新

标签