Pandas:groupby 列出

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53037888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:06:57  来源:igfitidea点击:

Pandas: groupby to list

pythonpandas

提问by Jadu Sen

I have data like below:

我有如下数据:

id  value   time

1   5   2000
1   6   2000
1   7   2000
1   5   2001
2   3   2000
2   3   2001
2   4   2005
2   5   2005
3   3   2000
3   6   2005

My final goal is to have data in a list like below:

我的最终目标是将数据放在如下列表中:

[[5,6,7],[5]] (this is for id 1 grouped by the id and year)
[[3],[3],[4,5]] (this is for id 2 grouped by the id and year)
[[3],[6]] (same logic as above)

I have grouped the data using df.groupby(['id', 'year']). But after that, I am not able to access the groups and get the data in the above format.

我已经使用df.groupby(['id', 'year']). 但在那之后,我无法访问组并以上述格式获取数据。

回答by sacuL

You can use apply(list):

您可以使用apply(list)

>>> df.groupby(['id', 'time'])['value'].apply(list)

id  time
1   2000    [5, 6, 7]
    2001          [5]
2   2000          [3]
    2001          [3]
    2005       [4, 5]
3   2000          [3]
    2005          [6]
Name: value, dtype: object

If you really want it in the exact format as you displayed, you can then groupby idand apply listagain, but this is not efficient, and that format is arguably harder to work with...

如果你真的想要它显示的确切格式,你可以 groupbyidlist再次申请,但这效率不高,而且这种格式可以说更难使用......

>>> df.groupby(['id','time'])['value'].apply(list).groupby('id').apply(list).tolist()
[[[5, 6, 7], [5]], [[3], [3], [4, 5]], [[3], [6]]]

回答by Dani Mesejo

You could do the following:

您可以执行以下操作:

import pandas as pd

data = [[1, 5, 2000],
        [1, 6, 2000],
        [1, 7, 2000],
        [1, 5, 2001],
        [2, 3, 2000],
        [2, 3, 2001],
        [2, 4, 2005],
        [2, 5, 2005],
        [3, 3, 2000],
        [3, 6, 2005]]

df = pd.DataFrame(data=data, columns=['id', 'value', 'year'])

result = []
for name, group in df.groupby(['id']):
    result.append([g['value'].values.tolist() for _, g in group.groupby(['year'])])

for e in result:
    print(e)

Output

输出

[[5, 6, 7], [5]]
[[3], [3], [4, 5]]
[[3], [6]]

回答by toto_tico

If you want to calculate the lists for multiple columns, you can do the following:

如果要计算多列的列表,可以执行以下操作:

df = pd.DataFrame(
    {'A': [1,1,2,2,2,2,3],
     'B':['a','b','c','d','e','f','g'],
     'C':['x','y','z','x','y','z','x']})

df.groupby('A').agg({ 'B': lambda x: list(x),'C': lambda x: list(x)})

Which will calculate lists of B and C at the same time:

这将同时计算 B 和 C 的列表:

              B             C
A                            
1        [a, b]        [x, y]
2  [c, d, e, f]  [z, x, y, z]
3           [g]           [x]