pandas 按两列分组并计算熊猫中每个组合的出现次数

Question

提问by chessosapiens

I have the following data frame:

我有以下数据框：

data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})

product_id  user_id
    p1       a1
    p1       a1
    p2       a1
    p1       a2
    p1       a2
    p1       a2
    p2       a3
    p2       a3
    p3       a3

in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame

在实际情况下，可能还有其他一些列，但我需要做的是按 product_id 和 user_id 列按数据框分组，并计算每个组合的数量，并将其添加为新数据框中的新列

output should be something like this:

输出应该是这样的：

user_id product_id  count
a1       p1            2
a1       p2            1
a2       p1            3
a3       p2            2
a3       p3            1

I have tried the following code:

我尝试了以下代码：

grouped=data.groupby(['user_id','product_id']).count()

but the result is:

但结果是：

user_id product_id
 a1       p1
          p2
 a2       p1
 a3       p2
          p3

actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.

实际上对我来说最重要的事情是拥有一个包含出现次数的列名计数，我稍后需要使用该列。

Answer 1

回答by Nehal J Wani

Maybe this is what you want?

也许这就是你想要的？

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

Answer 2

回答by sparrow

Building a little on @Nehal's response if you want to create another column with the results combined:

如果您想创建另一个包含结果的列，请在@Nehal 的响应上稍加构建：

you can combine the product_id and size columns:

您可以组合 product_id 和 size 列：

new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'

...and then use groupby again to organize that column by 'user_id'

...然后再次使用 groupby 按“user_id”组织该列

new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])

pandas 按两列分组并计算熊猫中每个组合的出现次数

提问by chessosapiens

回答by Nehal J Wani

回答by sparrow

相关推荐

最近更新

标签

pandas 按两列分组并计算熊猫中每个组合的出现次数

提问by chessosapiens

回答by Nehal J Wani

回答by sparrow

相关推荐

pandas 使用熊猫绘制具有真实日期的时间序列的简单方法

Pandas 等价的 rbind 操作

pandas 如何在单元测试中使用 assert_frame_equal

将二维数组放入 Pandas 系列

相关推荐

最近更新

标签