Python Keras Conv2D 和输入通道

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43306323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:50:41  来源:igfitidea点击:

Keras Conv2D and input channels

pythonkeras

提问by yoki

The Keras layer documentation specifies the input and output sizes for convolutional layers: https://keras.io/layers/convolutional/

Keras 层文档指定了卷积层的输入和输出大小:https: //keras.io/layers/convolutional/

Input shape: (samples, channels, rows, cols)

输入形状: (samples, channels, rows, cols)

Output shape: (samples, filters, new_rows, new_cols)

输出形状: (samples, filters, new_rows, new_cols)

And the kernel size is a spatial parameter, i.e. detemines only width and height.

并且内核大小是一个空间参数,即仅确定宽度和高度。

So an input with cchannels will yield an output with filterschannels regardless of the value of c. It must therefore apply 2D convolution with a spatial height x widthfilter and then aggregate the results somehow for each learned filter.

因此cfilters无论 的值如何,带有通道的输入都会产生带有通道的输出c。因此,它必须使用空间height x width滤波器应用 2D 卷积,然后以某种方式为每个学习的滤波器聚合结果。

What is this aggregation operator? is it a summation across channels? can I control it? I couldn't find any information on the Keras documentation.

这个聚合运算符是什么?它是跨渠道的总和吗?我可以控制吗?我在 Keras 文档中找不到任何信息。

Thanks.

谢谢。

采纳答案by noio

It might be confusing that it is called Conv2Dlayer (it was to me, which is why I came looking for this answer), because as Nilesh Birari commented:

它被称为Conv2D层可能会令人困惑(对我来说,这就是我来寻找这个答案的原因),因为正如 Nilesh Birari 评论的那样:

I guess you are missing it's 3D kernel [width, height, depth]. So the result is summation across channels.

我猜你错过了它的 3D 内核 [宽度、高度、深度]。所以结果是跨通道求和。

Perhaps the 2Dstems from the fact that the kernel only slidesalong two dimensions, the third dimension is fixed and determined by the number of input channels (the input depth).

也许2D源于这样一个事实,即内核仅沿二维滑动,第三维是固定的,由输入通道的数量(输入深度)决定。

For a more elaborate explanation, read https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

有关更详细的解释,请阅读https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

I plucked an illustrative image from there:

我从那里摘了一张说明性的图片:

kernel depth

内核深度

回答by Alaroff

I was also wondering this, and found another answer here, where it is stated (emphasis mine):

我也想知道这一点,并在这里找到了另一个答案,其中说明了(强调我的):

Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. Let's get it to a convolution layer with 3 input channels and 1 output channel. (...) What it does is that it calculates the convolution of each filter with its corresponding input channel (...). The stride of all channels are the same, so they output matrices with the same size. Now, it sums up all matrices and output a single matrix which is the only channelat the output of the convolution layer.

也许多通道输入最具体的例子是当你有一个有 3 个 RGB 通道的彩色图像时。让我们把它放到一个有 3 个输入通道和 1 个输出通道的卷积层。(...) 它的作用是计算每个滤波器与其对应的输入通道 (...) 的卷积。所有通道的步幅相同,因此它们输出的矩阵大小相同。现在,它将所有矩阵相加并输出一个矩阵,这是卷积层输出的唯一通道

Illustration:

插图:

enter image description here

在此处输入图片说明

Notice that the weights of the convolution kernels for each channel are different, which are then iteratively adjusted in the back-propagation steps by e.g. gradient decent based algorithms such as stochastic gradient descent (SDG).

请注意,每个通道的卷积核权重是不同的,然后在反向传播步骤中通过例如基于梯度下降的算法(例如随机梯度下降(SDG))迭代调整权重

Here is a more technical answer from TensorFlow API.

这是来自TensorFlow API的更具技术性的答案。

回答by Raimi bin Karim

I also needed to convince myself so I ran a simple example with a 3×3 RGB image.

我还需要说服自己,所以我用 3×3 RGB 图像运行了一个简单的例子。

# red    # green        # blue
1 1 1    100 100 100    10000 10000 10000
1 1 1    100 100 100    10000 10000 10000    
1 1 1    100 100 100    10000 10000 10000

The filter is initialised to ones:

过滤器初始化为:

1 1
1 1

enter image description here

在此处输入图片说明

I have also set the convolution to have these properties:

我还将卷积设置为具有以下属性:

  • no padding
  • strides = 1
  • relu activation function
  • bias initialised to 0
  • 无填充
  • 步幅 = 1
  • relu 激活函数
  • 偏差初始化为 0

We would expect the (aggregated) output to be:

我们希望(聚合)输出为:

40404 40404
40404 40404

Also, from the picture above, the no. of parameters is

另外,从上图来看,没有。参数是

3 separate filters (one for each channel) × 4 weights + 1 (bias, not shown) = 13 parameters

3 个单独的滤波器(每个通道一个)× 4 个权重 + 1(偏差,未显示)= 13 个参数



Here's the code.

这是代码。

Import modules:

导入模块:

import numpy as np
from keras.layers import Input, Conv2D
from keras.models import Model

Create the red, green and blue channels:

创建红色、绿色和蓝色通道:

red   = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue  = np.array([10000]*9).reshape((3,3))

Stack the channels to form an RGB image:

堆叠通道以形成 RGB 图像:

img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)

Create a model that just does a Conv2D convolution:

创建一个只进行 Conv2D 卷积的模型:

inputs = Input((3,3,3))
conv = Conv2D(filters=1, 
              strides=1, 
              padding='valid', 
              activation='relu',
              kernel_size=2, 
              kernel_initializer='ones', 
              bias_initializer='zeros', )(inputs)
model = Model(inputs,conv)

Input the image in the model:

在模型中输入图像:

model.predict(img)
# array([[[[40404.],
#          [40404.]],

#         [[40404.],
#          [40404.]]]], dtype=float32)

Run a summary to get the number of params:

运行摘要以获取参数数量:

model.summary()

enter image description here

在此处输入图片说明