Pandas:groupby 列 A 并从其他列制作元组列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46622869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: groupby column A and make lists of tuples from other columns?
提问by MrCartoonology
I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example,
我想将用户交易汇总到Pandas列表中。我不知道如何制作包含多个字段的列表。例如,
df = pd.DataFrame({'user':[1,1,2,2,3],
'time':[20,10,11,18, 15],
'amount':[10.99, 4.99, 2.99, 1.99, 10.99]})
which looks like
看起来像
amount time user
0 10.99 20 1
1 4.99 10 1
2 2.99 11 2
3 1.99 18 2
4 10.99 15 3
If I do
如果我做
print(df.groupby('user')['time'].apply(list))
I get
我得到
user
1 [20, 10]
2 [11, 18]
3 [15]
but if I do
但如果我这样做
df.groupby('user')[['time', 'amount']].apply(list)
I get
我得到
user
1 [time, amount]
2 [time, amount]
3 [time, amount]
Thanks to an answer below, I learned I can do this
感谢下面的回答,我知道我可以做到这一点
df.groupby('user').agg(lambda x: x.tolist()))
to get
要得到
amount time
user
1 [10.99, 4.99] [20, 10]
2 [2.99, 1.99] [11, 18]
3 [10.99] [15]
but I'm going to want to sort time and amounts in the same order - so I can go through each users transactions in order.
但我想以相同的顺序对时间和金额进行排序 - 这样我就可以按顺序浏览每个用户的交易。
I was looking for a way to produce this:
我正在寻找一种方法来产生这个:
amount-time-tuple
user
1 [(20, 10.99), (10, 4.99)]
2 [(11, 2.99), (18, 1.99)]
3 [(15, 10.99)]
but maybe there is a way to do the sort without "tupling" the two columns?
但也许有一种方法可以在不“重复”两列的情况下进行排序?
采纳答案by Bharath
apply(list)
will consider the series index not the values .I think you are looking for
apply(list)
将考虑系列索引而不是值。我认为您正在寻找
df.groupby('user')[['time', 'amount']].apply(lambda x: x.values.tolist())
user 1 [[23.0, 2.99], [50.0, 1.99]] 2 [[12.0, 1.99]]
回答by MaxU
IIUC:
IUC:
In [101]: df.groupby('user').agg(lambda x: x.tolist())
Out[101]:
time amount
user
1 [23, 50] [2.99, 1.99]
2 [12] [1.99]
回答by cml
Make a new column for amount-time tuple atpair
为时间元组创建一个新列 atpair
df['atpair'] = list(zip(df.amount, df.time))
The data frame looks like
数据框看起来像
user time amount atpair
0 1 20 10.99 (10.99, 20)
1 1 10 4.99 (4.99, 10)
2 2 11 2.99 (2.99, 11)
3 2 18 1.99 (1.99, 18)
4 3 15 10.99 (10.99, 15)
Now perform groupby and apply list append to atpair
现在执行 groupby 并应用列表附加到 atpair
df = df.groupby('user')['atpair'].apply(lambda x : x.values.tolist())
The data frame looks like
数据框看起来像
user
1 [(10.99, 20), (4.99, 10)]
2 [(2.99, 11), (1.99, 18)]
3 [(10.99, 15)]