Pandas DataFrame.groupby() 到具有多列值的字典
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49017178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame.groupby() to dictionary with multiple columns for value
提问by Micks Ketches
type(Table)
pandas.core.frame.DataFrame
Table
======= ======= =======
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
For anyone familliar with pandas how would I build a multivalue dictionary with the .groupby()
method?
对于熟悉Pandas的任何人,我将如何使用该.groupby()
方法构建多值字典?
I would like an output to resemble this format:
我想要一个类似于这种格式的输出:
{
0: [(23,1)]
1: [(5, 2), (2, 3), (19, 5)]
# etc...
}
where Col1
values are represented as keys and the corresponding Col2
and Col3
are tuples packed into an array for each Col1
key.
其中Col1
值表示为键和相应的Col2
并且Col3
是为每个Col1
键打包到数组中的元组。
My syntax works for pooling only one column into the .groupby()
:
我的语法仅用于将一列合并到.groupby()
:
Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
0: [23],
1: [5, 2, 19],
2: [56, 22],
3: [2, 14],
4: [59],
5: [44, 1, 87]
}
However specifying multiple values for the indices results in returning column names for the value :
但是,为索引指定多个值会导致返回值的列名:
Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
0: ['Column2', 'Column3'],
1: ['Column2', 'Column3'],
2: ['Column2', 'Column3'],
3: ['Column2', 'Column3'],
4: ['Column2', 'Column3'],
5: ['Column2', 'Column3']
}
How would I return a list of tuples in the value array?
我将如何返回值数组中的元组列表?
回答by Psidom
Customize the function you use in apply
so it returns a list of lists for each group:
自定义您在apply
其中使用的函数,以便它返回每个组的列表列表:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]],
# 1: [[5, 2], [2, 3], [19, 5]],
# 2: [[56, 1], [22, 2]],
# 3: [[2, 4], [14, 5]],
# 4: [[59, 1]],
# 5: [[44, 1], [1, 2], [87, 3]]}
If you need a list of tuples explicitly, use list(map(tuple, ...))
to convert:
如果您需要明确的元组列表,请使用list(map(tuple, ...))
转换:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)],
# 1: [(5, 2), (2, 3), (19, 5)],
# 2: [(56, 1), (22, 2)],
# 3: [(2, 4), (14, 5)],
# 4: [(59, 1)],
# 5: [(44, 1), (1, 2), (87, 3)]}
回答by jpp
One way is to create a new tup
column and then create the dictionary.
一种方法是创建一个新tup
列,然后创建字典。
df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()
# {0: [(23, 1)],
# 1: [(5, 2), (2, 3), (19, 5)],
# 2: [(56, 1), (22, 2)],
# 3: [(2, 4), (14, 5)],
# 4: [(59, 1)],
# 5: [(44, 1), (1, 2), (87, 3)]}
@Psidom's solutionis more efficient, but if performance isn't an issue use what makes more sense to you:
@Psidom 的解决方案更有效,但如果性能不是问题,请使用对您更有意义的方法:
df = pd.concat([df]*10000)
def jp(df):
df['tup'] = list(zip(df['Column2'], df['Column3']))
return df.groupby('Column1')['tup'].apply(list).to_dict()
def psi(df):
return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
%timeit jp(df) # 110ms
%timeit psi(df) # 80ms
回答by piRSquared
I'd rather use defaultdict
我宁愿用 defaultdict
from collections import defaultdict
d = defaultdict(list)
for row in df.values.tolist():
d[row[0]].append(tuple(row[1:]))
dict(d)
{0: [(23, 1)],
1: [(5, 2), (2, 3), (19, 5)],
2: [(56, 1), (22, 2)],
3: [(2, 4), (14, 5)],
4: [(59, 1)],
5: [(44, 1), (1, 2), (87, 3)]}