pandas 按两列分组并计算熊猫中每个组合的出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38933071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by two columns and count the occurrences of each combination in pandas
提问by chessosapiens
I have the following data frame:
我有以下数据框:
data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
product_id user_id
p1 a1
p1 a1
p2 a1
p1 a2
p1 a2
p1 a2
p2 a3
p2 a3
p3 a3
in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame
在实际情况下,可能还有其他一些列,但我需要做的是按 product_id 和 user_id 列按数据框分组,并计算每个组合的数量,并将其添加为新数据框中的新列
output should be something like this:
输出应该是这样的:
user_id product_id count
a1 p1 2
a1 p2 1
a2 p1 3
a3 p2 2
a3 p3 1
I have tried the following code:
我尝试了以下代码:
grouped=data.groupby(['user_id','product_id']).count()
but the result is:
但结果是:
user_id product_id
a1 p1
p2
a2 p1
a3 p2
p3
actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.
实际上对我来说最重要的事情是拥有一个包含出现次数的列名计数,我稍后需要使用该列。
回答by Nehal J Wani
Maybe this is what you want?
也许这就是你想要的?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
回答by sparrow
Building a little on @Nehal's response if you want to create another column with the results combined:
如果您想创建另一个包含结果的列,请在@Nehal 的响应上稍加构建:
you can combine the product_id and size columns:
您可以组合 product_id 和 size 列:
new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'
...and then use groupby again to organize that column by 'user_id'
...然后再次使用 groupby 按“user_id”组织该列
new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])