Pandas DataFrame.groupby() 到具有多列值的字典

Question

提问by Micks Ketches

type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

For anyone familliar with pandas how would I build a multivalue dictionary with the .groupby()method?

对于熟悉Pandas的任何人，我将如何使用该.groupby()方法构建多值字典？

I would like an output to resemble this format:

我想要一个类似于这种格式的输出：

{
    0: [(23,1)]
    1: [(5,  2), (2, 3), (19, 5)]
    # etc...
    }

where Col1values are represented as keys and the corresponding Col2and Col3are tuples packed into an array for each Col1key.

其中Col1值表示为键和相应的Col2并且Col3是为每个Col1键打包到数组中的元组。

My syntax works for pooling only one column into the .groupby():

我的语法仅用于将一列合并到.groupby()：

Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
    0: [23], 
    1: [5, 2, 19], 
    2: [56, 22], 
    3: [2, 14], 
    4: [59], 
    5: [44, 1, 87]
}

However specifying multiple values for the indices results in returning column names for the value :

但是，为索引指定多个值会导致返回值的列名：

Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
    0: ['Column2', 'Column3'],
    1: ['Column2', 'Column3'],
    2: ['Column2', 'Column3'],
    3: ['Column2', 'Column3'],
    4: ['Column2', 'Column3'],
    5: ['Column2', 'Column3']
 }

How would I return a list of tuples in the value array?

我将如何返回值数组中的元组列表？

Answer 1

回答by Psidom

Customize the function you use in applyso it returns a list of lists for each group:

自定义您在apply其中使用的函数，以便它返回每个组的列表列表：

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]], 
#  1: [[5, 2], [2, 3], [19, 5]], 
#  2: [[56, 1], [22, 2]], 
#  3: [[2, 4], [14, 5]], 
#  4: [[59, 1]], 
#  5: [[44, 1], [1, 2], [87, 3]]}

If you need a list of tuples explicitly, use list(map(tuple, ...))to convert:

如果您需要明确的元组列表，请使用list(map(tuple, ...))转换：

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)], 
#  1: [(5, 2), (2, 3), (19, 5)], 
#  2: [(56, 1), (22, 2)], 
#  3: [(2, 4), (14, 5)], 
#  4: [(59, 1)], 
#  5: [(44, 1), (1, 2), (87, 3)]}

Answer 2

回答by jpp

One way is to create a new tupcolumn and then create the dictionary.

一种方法是创建一个新tup列，然后创建字典。

df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()

# {0: [(23, 1)],
#  1: [(5, 2), (2, 3), (19, 5)],
#  2: [(56, 1), (22, 2)],
#  3: [(2, 4), (14, 5)],
#  4: [(59, 1)],
#  5: [(44, 1), (1, 2), (87, 3)]}

@Psidom's solutionis more efficient, but if performance isn't an issue use what makes more sense to you:

@Psidom 的解决方案更有效，但如果性能不是问题，请使用对您更有意义的方法：

df = pd.concat([df]*10000)

def jp(df):
    df['tup'] = list(zip(df['Column2'], df['Column3']))
    return df.groupby('Column1')['tup'].apply(list).to_dict()

def psi(df):
    return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

%timeit jp(df)   # 110ms
%timeit psi(df)  # 80ms

Answer 3

回答by piRSquared

I'd rather use defaultdict

我宁愿用 defaultdict

from collections import defaultdict

d = defaultdict(list)

for row in df.values.tolist():
    d[row[0]].append(tuple(row[1:]))

dict(d)

{0: [(23, 1)],
 1: [(5, 2), (2, 3), (19, 5)],
 2: [(56, 1), (22, 2)],
 3: [(2, 4), (14, 5)],
 4: [(59, 1)],
 5: [(44, 1), (1, 2), (87, 3)]}

Pandas DataFrame.groupby() 到具有多列值的字典

提问by Micks Ketches

回答by Psidom

回答by jpp

回答by piRSquared

相关推荐

最近更新

标签

Pandas DataFrame.groupby() 到具有多列值的字典

提问by Micks Ketches

回答by Psidom

回答by jpp

回答by piRSquared

相关推荐

pandas 熊猫系列到二维数组

pandas 将pandas系列输出到txt文件

Pandas：使用 .isin() 返回错误：“AttributeError: float' object has no attribute 'isin'”

pandas 如何检查每个熊猫系列值是否唯一

相关推荐

最近更新

标签