Pandas:groupby 列出
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53037888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: groupby to list
提问by Jadu Sen
I have data like below:
我有如下数据:
id value time
1 5 2000
1 6 2000
1 7 2000
1 5 2001
2 3 2000
2 3 2001
2 4 2005
2 5 2005
3 3 2000
3 6 2005
My final goal is to have data in a list like below:
我的最终目标是将数据放在如下列表中:
[[5,6,7],[5]] (this is for id 1 grouped by the id and year)
[[3],[3],[4,5]] (this is for id 2 grouped by the id and year)
[[3],[6]] (same logic as above)
I have grouped the data using df.groupby(['id', 'year'])
. But after that, I am not able to access the groups and get the data in the above format.
我已经使用df.groupby(['id', 'year'])
. 但在那之后,我无法访问组并以上述格式获取数据。
回答by sacuL
You can use apply(list)
:
您可以使用apply(list)
:
>>> df.groupby(['id', 'time'])['value'].apply(list)
id time
1 2000 [5, 6, 7]
2001 [5]
2 2000 [3]
2001 [3]
2005 [4, 5]
3 2000 [3]
2005 [6]
Name: value, dtype: object
If you really want it in the exact format as you displayed, you can then groupby id
and apply list
again, but this is not efficient, and that format is arguably harder to work with...
如果你真的想要它显示的确切格式,你可以 groupbyid
并list
再次申请,但这效率不高,而且这种格式可以说更难使用......
>>> df.groupby(['id','time'])['value'].apply(list).groupby('id').apply(list).tolist()
[[[5, 6, 7], [5]], [[3], [3], [4, 5]], [[3], [6]]]
回答by Dani Mesejo
You could do the following:
您可以执行以下操作:
import pandas as pd
data = [[1, 5, 2000],
[1, 6, 2000],
[1, 7, 2000],
[1, 5, 2001],
[2, 3, 2000],
[2, 3, 2001],
[2, 4, 2005],
[2, 5, 2005],
[3, 3, 2000],
[3, 6, 2005]]
df = pd.DataFrame(data=data, columns=['id', 'value', 'year'])
result = []
for name, group in df.groupby(['id']):
result.append([g['value'].values.tolist() for _, g in group.groupby(['year'])])
for e in result:
print(e)
Output
输出
[[5, 6, 7], [5]]
[[3], [3], [4, 5]]
[[3], [6]]
回答by toto_tico
If you want to calculate the lists for multiple columns, you can do the following:
如果要计算多列的列表,可以执行以下操作:
df = pd.DataFrame(
{'A': [1,1,2,2,2,2,3],
'B':['a','b','c','d','e','f','g'],
'C':['x','y','z','x','y','z','x']})
df.groupby('A').agg({ 'B': lambda x: list(x),'C': lambda x: list(x)})
Which will calculate lists of B and C at the same time:
这将同时计算 B 和 C 的列表:
B C
A
1 [a, b] [x, y]
2 [c, d, e, f] [z, x, y, z]
3 [g] [x]