pandas 我在 groupby 上应用了 sum(),我想对最后一列的值进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44742945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
I applied sum() on a groupby and I want to sort the values of the last column
提问by KawtarZZ
Given the following DataFrame
鉴于以下数据帧
user_ID product_id amount
1 456 1
1 87 1
1 788 3
1 456 5
1 87 2
... ... ...
The first column is the ID of the customer, the second is the ID of the product he bought and the 'amount' express if the quantity of the product purchased on that given day (the date is also taken into consideration). a customer can buy many products each day as much as he wants to.
I want to calculate the total of times each product is bought by the customer, so I applied a groupby
第一列是客户的ID,第二列是他购买的产品的ID,'amount'表示当天购买的产品数量(日期也考虑在内)。客户每天可以购买任意数量的产品。我想计算客户购买每种产品的总次数,所以我应用了groupby
df.groupby(['user_id','product_id'], sort=True).sum()
now I want to sort the sum of amount in each group. Any help?
现在我想对每组中的金额总和进行排序。有什么帮助吗?
回答by student
Suppose df
is:
假设df
是:
user_ID product_id amount
0 1 456 1
1 1 87 1
2 1 788 3
3 1 456 5
4 1 87 2
5 2 456 1
6 2 788 3
7 2 456 5
Then you can use, groupby
and sum
as before, in addition you can sort values by two columns [user_ID, amount]
and ascending=[True,False]
refers ascending order of user and for each user descending order of amount:
然后你可以使用,groupby
和sum
以前一样,此外,你可以按两列对值进行排序,[user_ID, amount]
并ascending=[True,False]
引用用户的升序和每个用户的金额降序:
new_df = df.groupby(['user_ID','product_id'], sort=True).sum().reset_index()
new_df = new_df.sort_values(by = ['user_ID', 'amount'], ascending=[True,False])
print(new_df)
Output:
输出:
user_ID product_id amount
1 1 456 6
0 1 87 3
2 1 788 3
3 2 456 6
4 2 788 3
回答by FAMG
You could also use aggregate()
:
您还可以使用aggregate()
:
# Make up some example data
df = data.frame (user_ID = as.factor(rep(1:5, each = 5)),
product_id = as.factor(sample(seq(1:10),size = 25, replace = TRUE)),
amount = sample(1:5, size = 25, replace = TRUE))
# Use aggregate with function sum to calculate the amount of products bought by product and customer
aggregate(amount ~ product_id * user_ID , data = df, FUN = sum)
Output:
输出:
product_id user_ID amount
1 2 1 3
2 4 1 2
3 6 1 1
4 9 1 5
5 1 2 5
6 3 2 9
7 8 2 1
8 10 2 5
9 2 3 5
10 3 3 5
11 4 3 5
12 5 3 3
13 8 3 5
14 3 4 3
15 4 4 9
16 5 4 2
17 10 4 1
18 2 5 1
19 4 5 4
20 5 5 2
21 10 5 2