Pandas：groupby 列出

Question

提问by Jadu Sen

I have data like below:

我有如下数据：

id  value   time

1   5   2000
1   6   2000
1   7   2000
1   5   2001
2   3   2000
2   3   2001
2   4   2005
2   5   2005
3   3   2000
3   6   2005

My final goal is to have data in a list like below:

我的最终目标是将数据放在如下列表中：

[[5,6,7],[5]] (this is for id 1 grouped by the id and year)
[[3],[3],[4,5]] (this is for id 2 grouped by the id and year)
[[3],[6]] (same logic as above)

I have grouped the data using df.groupby(['id', 'year']). But after that, I am not able to access the groups and get the data in the above format.

我已经使用df.groupby(['id', 'year']). 但在那之后，我无法访问组并以上述格式获取数据。

Answer 1

回答by sacuL

You can use apply(list):

您可以使用apply(list)：

>>> df.groupby(['id', 'time'])['value'].apply(list)

id  time
1   2000    [5, 6, 7]
    2001          [5]
2   2000          [3]
    2001          [3]
    2005       [4, 5]
3   2000          [3]
    2005          [6]
Name: value, dtype: object

If you really want it in the exact format as you displayed, you can then groupby idand apply listagain, but this is not efficient, and that format is arguably harder to work with...

如果你真的想要它显示的确切格式，你可以 groupbyid并list再次申请，但这效率不高，而且这种格式可以说更难使用......

>>> df.groupby(['id','time'])['value'].apply(list).groupby('id').apply(list).tolist()
[[[5, 6, 7], [5]], [[3], [3], [4, 5]], [[3], [6]]]

Answer 2

回答by Dani Mesejo

You could do the following:

您可以执行以下操作：

import pandas as pd

data = [[1, 5, 2000],
        [1, 6, 2000],
        [1, 7, 2000],
        [1, 5, 2001],
        [2, 3, 2000],
        [2, 3, 2001],
        [2, 4, 2005],
        [2, 5, 2005],
        [3, 3, 2000],
        [3, 6, 2005]]

df = pd.DataFrame(data=data, columns=['id', 'value', 'year'])

result = []
for name, group in df.groupby(['id']):
    result.append([g['value'].values.tolist() for _, g in group.groupby(['year'])])

for e in result:
    print(e)

Output

输出

[[5, 6, 7], [5]]
[[3], [3], [4, 5]]
[[3], [6]]

Answer 3

回答by toto_tico

If you want to calculate the lists for multiple columns, you can do the following:

如果要计算多列的列表，可以执行以下操作：

df = pd.DataFrame(
    {'A': [1,1,2,2,2,2,3],
     'B':['a','b','c','d','e','f','g'],
     'C':['x','y','z','x','y','z','x']})

df.groupby('A').agg({ 'B': lambda x: list(x),'C': lambda x: list(x)})

Which will calculate lists of B and C at the same time:

这将同时计算 B 和 C 的列表：

              B             C
A                            
1        [a, b]        [x, y]
2  [c, d, e, f]  [z, x, y, z]
3           [g]           [x]

Pandas：groupby 列出

提问by Jadu Sen

回答by sacuL

回答by Dani Mesejo

回答by toto_tico

相关推荐

最近更新

标签

Pandas：groupby 列出

提问by Jadu Sen

回答by sacuL

回答by Dani Mesejo

回答by toto_tico

相关推荐

pandas FutureWarning：不推荐使用非元组序列进行多维索引，使用 `arr[tuple(seq)]`

pandas Python + 数据框：AttributeError：'float' 对象没有属性 'replace'

为什么 Pandas 给出 AttributeError: 'SeriesGroupBy' 对象没有属性 'pct'？

pandas 类型错误：输入类型不支持 ufunc 'isnan'，并且无法安全地强制输入

相关推荐

最近更新

标签