Pandas - 图像到 DataFrame

Question

提问by Terence Eden

I want to convert an RGB image into a DataFrame, so that I have the co-ordinates of each pixel and their RGB value.

我想将 RGB 图像转换为 DataFrame，以便我拥有每个像素的坐标及其 RGB 值。

         x   y   red  green  blue
0        0   0   154      0     0
1        1   0   149    111     0
2        2   0   153      0     5
3        0   1   154      0     9
4        1   1   154     10    10
5        2   1   154      0     0

I can extract the RGB into a DataFrame quite easily

我可以很容易地将 RGB 提取到 DataFrame 中

colourImg = Image.open("test.png")
colourPixels = colourImg.convert("RGB")
colourArray = np.array(colourPixels.getdata())

df = pd.DataFrame(colourArray, columns=["red","green","blue"])

But I don't know how to get the X & Y coordinates in there. I couldwrite a loop, but on a large image that takes a long time.

但我不知道如何在那里获得 X 和 Y 坐标。我可以写一个循环，但是在需要很长时间的大图像上。

Answer 1

采纳答案by davidsheldon

Try using np.indicesunfortunately it ends up with a array where the coordinate is the first dimension, but you can do a bit of np.moveaxisto fix that.

尝试使用np.indices不幸的是它最终得到一个坐标是第一维的数组，但你可以做一些np.moveaxis来解决这个问题。

colourImg = Image.open("test.png")
colourPixels = colourImg.convert("RGB")
colourArray = np.array(colourPixels.getdata()).reshape(colourImg.size + (3,))
indicesArray = np.moveaxis(np.indices(colourImg.size), 0, 2)
allArray = np.dstack((indicesArray, colourArray)).reshape((-1, 5))


df = pd.DataFrame(allArray, columns=["y", "x", "red","green","blue"])

It's not the pretiest, but it seems to work (edit: fixed x,y being the wrong way around).

它不是最漂亮的，但它似乎有效（编辑：固定 x,y 是错误的方式）。

Answer 2

回答by eugenhu

I've named the coordinates 'col' and 'row' to be explicit and avoid confusion if the x-coordinate is reffering to the column number or row number of your original pixel array:

如果 x 坐标是指原始像素数组的列号或行号，我已将坐标命名为“col”和“row”，以明确表示并避免混淆：

A = colourArray

# Create the multiindex we'll need for the series
index = pd.MultiIndex.from_product(
    (*map(range, A.shape[:2]), ('r', 'g', 'b')),
    names=('row', 'col', None)
)

# Can be chained but separated for use in explanation
df = pd.Series(A.flatten(), index=index)
df = df.unstack()
df = df.reset_index().reindex(columns=['col', 'row', 'r', 'g', 'b'])

Explanation:

解释：

pd.Series(A.flatten(), index=index)will create a multiindex series where each channel intensity is accessible via df[row_n, col_n][channel_r_g_or_b]. The dfvariable (currently a series) will now look something like this:

pd.Series(A.flatten(), index=index)将创建一个多索引系列，其中每个通道强度都可以通过df[row_n, col_n][channel_r_g_or_b]. 该df变量（目前A系列）现在是这个样子：

row  col   
0    0    r    116
          g     22
          b    220
     1    r     75
          g    134
          b     43
              ... 
255  246  r     79
          g      9
          b    218
     247  r    225
          g    172
          b    172

unstack()will pivot the third index (channel index), returning a dataframe with columns b, g, rwith each row indexed by a multiindex of (row_n, col_n). The dfnow looks like this:

unstack()将旋转第三个索引（通道索引），返回一个包含列b,的数据帧g，r每行由的多索引索引(row_n, col_n)。在df现在看起来是这样的：

           b    g    r
row col               
0   0    220   22  116
    1     43  134   75
    2    187   97   33
... ...  ...  ...  ...
255 226  156  242  128
    227  221   63  212
    228   75  110  193

We then call reset_index()to get rid of the (row_n, col_n)multiindex and just have a flat 0..?(n_pixels-1)index. The dfis now:

然后我们调用reset_index()以摆脱多索引(row_n, col_n)并只有一个平面0..?(n_pixels-1)索引。现在df是：

       row  col    b    g    r
0        0    0  220   22  116
1        0    1   43  134   75
2        0    2  187   97   33
...    ...  ...  ...  ...  ...
65506  255  226  156  242  128
65507  255  227  221   63  212
65508  255  228   75  110  193

And then a simple reindex()to rearrange the columns into col, row, r, g, border.

然后简单reindex()地将列重新排列为col, row, r, g,b顺序。

Timings:

时间：

Now as for how fast this runs, well... for a 3-channel image, here are the timings:

现在至于它运行的速度，嗯......对于 3 通道图像，这里是时间：

Size       Time
  250x250  58.2 ms
  500x500   251 ms
1000x1000  1.03 s
2500x2500  8.14 s

Admittedly not great on images > 1 MP. unstack()can take a while after the df gets very large.

不可否认，图像 > 1 MP 不是很好。unstack()在 df 变得非常大之后可能需要一段时间。

I've tried @davidsheldon's solutionand it ran a lot quicker, for a 2500x2500 image, it took 244 ms, and a 10000x10000 image took 9.04 s.

我尝试过@davidsheldon 的解决方案，它运行得更快，对于 2500x2500 的图像，它需要 244 毫秒，而 10000x10000 的图像需要 9.04 秒。

Pandas - 图像到 DataFrame

提问by Terence Eden

采纳答案by davidsheldon

回答by eugenhu

相关推荐

最近更新

标签

Pandas - 图像到 DataFrame

提问by Terence Eden

采纳答案by davidsheldon

回答by eugenhu

相关推荐

pandas 熊猫左连接，其中多列上的右为空

Pandas 验证日期格式

将 geopandas 地理数据框转换为 Pandas 数据框

Python Pandas：如何设置多索引的名称？

相关推荐

最近更新

标签