Pandas:groupby 列 A 并从其他列制作元组列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46622869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:35:55  来源:igfitidea点击:

Pandas: groupby column A and make lists of tuples from other columns?

pythonpandasdataframepandas-groupby

提问by MrCartoonology

I would like to aggregate user transactions into lists in pandas. I can't figure out how to make a list comprised of more than one field. For example,

我想将用户交易汇总到Pandas列表中。我不知道如何制作包含多个字段的列表。例如,

df = pd.DataFrame({'user':[1,1,2,2,3], 
                   'time':[20,10,11,18, 15], 
                   'amount':[10.99, 4.99, 2.99, 1.99, 10.99]})

which looks like

看起来像

    amount  time  user
0   10.99    20     1
1    4.99    10     1
2    2.99    11     2
3    1.99    18     2
4   10.99    15     3

If I do

如果我做

print(df.groupby('user')['time'].apply(list))

I get

我得到

user
1    [20, 10]
2    [11, 18]
3        [15]

but if I do

但如果我这样做

df.groupby('user')[['time', 'amount']].apply(list)

I get

我得到

user
1    [time, amount]
2    [time, amount]
3    [time, amount]

Thanks to an answer below, I learned I can do this

感谢下面的回答,我知道我可以做到这一点

df.groupby('user').agg(lambda x: x.tolist()))

to get

要得到

             amount      time
user                         
1     [10.99, 4.99]  [20, 10]
2      [2.99, 1.99]  [11, 18]
3           [10.99]      [15]

but I'm going to want to sort time and amounts in the same order - so I can go through each users transactions in order.

但我想以相同的顺序对时间和金额进行排序 - 这样我就可以按顺序浏览每个用户的交易。

I was looking for a way to produce this:

我正在寻找一种方法来产生这个:

             amount-time-tuple
user                         
1     [(20, 10.99), (10, 4.99)]
2     [(11,  2.99), (18, 1.99)]
3     [(15, 10.99)]

but maybe there is a way to do the sort without "tupling" the two columns?

但也许有一种方法可以在不“重复”两列的情况下进行排序?

采纳答案by Bharath

apply(list)will consider the series index not the values .I think you are looking for

apply(list)将考虑系列索引而不是值。我认为您正在寻找

df.groupby('user')[['time', 'amount']].apply(lambda x: x.values.tolist())
user
1    [[23.0, 2.99], [50.0, 1.99]]
2                  [[12.0, 1.99]]

回答by MaxU

IIUC:

IUC:

In [101]: df.groupby('user').agg(lambda x: x.tolist())
Out[101]:
          time        amount
user
1     [23, 50]  [2.99, 1.99]
2         [12]        [1.99]

回答by cml

Make a new column for amount-time tuple atpair

为时间元组创建一个新列 atpair

 df['atpair'] = list(zip(df.amount, df.time))

The data frame looks like

数据框看起来像

        user  time  amount       atpair
    0     1    20   10.99  (10.99, 20)
    1     1    10    4.99   (4.99, 10)
    2     2    11    2.99   (2.99, 11)
    3     2    18    1.99   (1.99, 18)
    4     3    15   10.99  (10.99, 15)

Now perform groupby and apply list append to atpair

现在执行 groupby 并应用列表附加到 atpair

 df = df.groupby('user')['atpair'].apply(lambda x : x.values.tolist())

The data frame looks like

数据框看起来像

user
1    [(10.99, 20), (4.99, 10)]
2     [(2.99, 11), (1.99, 18)]
3                [(10.99, 15)]