从 Pandas 数据帧创建二维数组

Question

提问by mgutsche

Probably a very simple question, but I couldn't come up with a solution. I have a data frame with 9 columns and ~100000 rows. The data was extracted from an image, such that two columns ('row' and 'col') are referring to the pixel position of the data. How can I create a numpy array A such that the row and column points to another data entry in another column, e.g. 'grumpiness'?

可能是一个非常简单的问题，但我想不出解决方案。我有一个包含 9 列和 ~100000 行的数据框。数据是从图像中提取的，因此两列（“row”和“col”）指的是数据的像素位置。如何创建一个 numpy 数组 A，使得行和列指向另一列中的另一个数据条目，例如“脾气暴躁”？

A[row, col]
#  0.1232

I want to avoid a for loop or something similar.

我想避免 for 循环或类似的东西。

Answer 1

回答by Divakar

You could do something like this -

你可以做这样的事情 -

# Extract row and column information
rowIDs = df['row']
colIDs = df['col']

# Setup image array and set values into it from "grumpiness" column
A = np.zeros((rowIDs.max()+1,colIDs.max()+1))
A[rowIDs,colIDs] = df['grumpiness']

Sample run -

样品运行 -

>>> df
   row  col  grumpiness
0    5    0    0.846412
1    0    1    0.703981
2    3    1    0.212358
3    0    2    0.101585
4    5    1    0.424694
5    5    2    0.473286
>>> A
array([[ 0.        ,  0.70398113,  0.10158488],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.21235838,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.84641194,  0.42469369,  0.47328598]])

Answer 2

回答by jakevdp

One very quick and straightforward way to do this is to use a pivot_table:

一种非常快速和直接的方法是使用pivot_table：

>>> df
   row  col  grumpiness
0    5    0    0.846412
1    0    1    0.703981
2    3    1    0.212358
3    0    2    0.101585
4    5    1    0.424694
5    5    2    0.473286

>>> df.pivot_table('grumpiness', 'row', 'col', fill_value=0)
col         0         1         2
row                              
0    0.000000  0.703981  0.101585
3    0.000000  0.212358  0.000000
5    0.846412  0.424694  0.473286

Note that if any full rows/cols are missing, it will leave them out, and if any row/col pair is repeated, it will average the results. That said, this will generally be much faster for larger datasets than an indexing-based approach.

请注意，如果缺少任何完整的行/列，它会将它们排除在外，如果重复任何行/列对，它将平均结果。也就是说，对于较大的数据集，这通常比基于索引的方法快得多。

从 Pandas 数据帧创建二维数组

提问by mgutsche

回答by Divakar

回答by jakevdp

相关推荐

最近更新

标签

从 Pandas 数据帧创建二维数组

提问by mgutsche

回答by Divakar

回答by jakevdp

相关推荐

pandas 按python中列的值拆分大csv文件

pandas 大熊猫数据帧并行处理

将每行 Pandas 数据帧写入一个新的文本文件 - pythonic 方式

pandas Seaborn 情节没有出现

相关推荐

最近更新

标签