Pandas:groupby 列 A 并从其他列制作元组列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46622869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: groupby column A and make lists of tuples from other columns?
提问by MrCartoonology
I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example,
我想将用户交易汇总到Pandas列表中。我不知道如何制作包含多个字段的列表。例如,
df = pd.DataFrame({'user':[1,1,2,2,3],
'time':[20,10,11,18, 15],
'amount':[10.99, 4.99, 2.99, 1.99, 10.99]})
which looks like
看起来像
amount time user
0 10.99 20 1
1 4.99 10 1
2 2.99 11 2
3 1.99 18 2
4 10.99 15 3
If I do
如果我做
print(df.groupby('user')['time'].apply(list))
I get
我得到
user
1 [20, 10]
2 [11, 18]
3 [15]
but if I do
但如果我这样做
df.groupby('user')[['time', 'amount']].apply(list)
I get
我得到
user
1 [time, amount]
2 [time, amount]
3 [time, amount]
Thanks to an answer below, I learned I can do this
感谢下面的回答,我知道我可以做到这一点
df.groupby('user').agg(lambda x: x.tolist()))
to get
要得到
amount time
user
1 [10.99, 4.99] [20, 10]
2 [2.99, 1.99] [11, 18]
3 [10.99] [15]
but I'm going to want to sort time and amounts in the same order - so I can go through each users transactions in order.
但我想以相同的顺序对时间和金额进行排序 - 这样我就可以按顺序浏览每个用户的交易。
I was looking for a way to produce this:
我正在寻找一种方法来产生这个:
amount-time-tuple
user
1 [(20, 10.99), (10, 4.99)]
2 [(11, 2.99), (18, 1.99)]
3 [(15, 10.99)]
but maybe there is a way to do the sort without "tupling" the two columns?
但也许有一种方法可以在不“重复”两列的情况下进行排序?
采纳答案by Bharath
apply(list)will consider the series index not the values .I think you are looking for
apply(list)将考虑系列索引而不是值。我认为您正在寻找
df.groupby('user')[['time', 'amount']].apply(lambda x: x.values.tolist())
user 1 [[23.0, 2.99], [50.0, 1.99]] 2 [[12.0, 1.99]]
回答by MaxU
IIUC:
IUC:
In [101]: df.groupby('user').agg(lambda x: x.tolist())
Out[101]:
time amount
user
1 [23, 50] [2.99, 1.99]
2 [12] [1.99]
回答by cml
Make a new column for amount-time tuple atpair
为时间元组创建一个新列 atpair
df['atpair'] = list(zip(df.amount, df.time))
The data frame looks like
数据框看起来像
user time amount atpair
0 1 20 10.99 (10.99, 20)
1 1 10 4.99 (4.99, 10)
2 2 11 2.99 (2.99, 11)
3 2 18 1.99 (1.99, 18)
4 3 15 10.99 (10.99, 15)
Now perform groupby and apply list append to atpair
现在执行 groupby 并应用列表附加到 atpair
df = df.groupby('user')['atpair'].apply(lambda x : x.values.tolist())
The data frame looks like
数据框看起来像
user
1 [(10.99, 20), (4.99, 10)]
2 [(2.99, 11), (1.99, 18)]
3 [(10.99, 15)]

