Pandas Groupy 只取前 N 个组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31655634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Groupy take only the first N Groups
提问by Christian Sauer
I have some DataFrame which I want to group by the ID, e. g.:
我有一些要按 ID 分组的 DataFrame,例如:
import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]})
print df
Which generates:
产生:
item_id user_id
0 a 1
1 a 2
2 b 1
3 b 1
4 b 3
5 c 1
6 d 5
[7 rows x 2 columns]
I can easily group by the id:
我可以轻松地按 id 分组:
grouped = df.groupby("item_id")
But how can I return only the first N group-by objects? E. g. I want only the first 3 unique item_ids.
但是我怎样才能只返回前 N 个 group-by 对象呢?例如 我只想要前 3 个唯一的 item_id。
回答by Jianxun Li
Here is one way using list(grouped).
这是使用list(grouped).
result = [g[1] for g in list(grouped)[:3]]
# 1st
result[0]
item_id user_id
0 a 1
1 a 2
# 2nd
result[1]
item_id user_id
2 b 1
3 b 1
4 b 3
回答by Alexander
One method is to use Counterto get the top 3 unique items from the list, filter your DataFrame based on those items, and then perform a groupby operation on this filtered DataFrame.
一种方法是使用Counter从列表中获取前 3 个唯一项,根据这些项过滤您的 DataFrame,然后对这个过滤后的 DataFrame 执行 groupby 操作。
from collections import Counter
c = Counter(df.item_id)
most_common = [item for item, _ in c.most_common(3)]
>>> df[df.item_id.isin(most_common)].groupby('item_id').sum()
user_id
item_id
a 3
b 5
c 1

