Python Tensorflow Strides 参数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34642595/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:17:50  来源:igfitidea点击:

Tensorflow Strides Argument

pythonneural-networkconvolutiontensorflowconv-neural-network

提问by jfbeltran

I am trying to understand the stridesargument in tf.nn.avg_pool, tf.nn.max_pool, tf.nn.conv2d.

我想了解的进步在tf.nn.avg_pool,tf.nn.max_pool,tf.nn.conv2d说法。

The documentationrepeatedly says

文件反复说

strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.

strides:长度 >= 4 的整数列表。输入张量的每个维度的滑动窗口的步长。

My questions are:

我的问题是:

  1. What do each of the 4+ integers represent?
  2. Why must they have strides[0] = strides[3] = 1 for convnets?
  3. In this examplewe see tf.reshape(_X,shape=[-1, 28, 28, 1]). Why -1?
  1. 4+个整数分别代表什么?
  2. 为什么他们必须有 strides[0] = strides[3] = 1 for convnets?
  3. 这个例子中,我们看到tf.reshape(_X,shape=[-1, 28, 28, 1]). 为什么-1?

Sadly the examples in the docs for reshape using -1 don't translate too well to this scenario.

遗憾的是,文档中使用 -1 进行重塑的示例并不能很好地转换为这种情况。

采纳答案by dga

The pooling and convolutional ops slide a "window" across the input tensor. Using tf.nn.conv2das an example: If the input tensor has 4 dimensions: [batch, height, width, channels], then the convolution operates on a 2D window on the height, widthdimensions.

池化和卷积操作在输入张量上滑动一个“窗口”。使用tf.nn.conv2d作为一个例子:如果输入张量有4个方面: [batch, height, width, channels],则卷积在二维窗口上操作height, width的尺寸。

stridesdetermines how much the window shifts by in each of the dimensions. The typical use sets the first (the batch) and last (the depth) stride to 1.

strides确定窗口在每个维度上的移动量。典型用法将第一个(批次)和最后一个(深度)步幅设置为 1。

Let's use a very concrete example: Running a 2-d convolution over a 32x32 greyscale input image. I say greyscale because then the input image has depth=1, which helps keep it simple. Let that image look like this:

让我们使用一个非常具体的例子:在 32x32 灰度输入图像上运行二维卷积。我说灰度是因为输入图像的深度=1,这有助于保持简单。让该图像看起来像这样:

00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...

Let's run a 2x2 convolution window over a single example (batch size = 1). We'll give the convolution an output channel depth of 8.

让我们在单个示例上运行 2x2 卷积窗口(批量大小 = 1)。我们将为卷积提供 8 的输出通道深度。

The input to the convolution has shape=[1, 32, 32, 1].

卷积的输入有shape=[1, 32, 32, 1]

If you specify strides=[1,1,1,1]with padding=SAME, then the output of the filter will be [1, 32, 32, 8].

如果指定strides=[1,1,1,1]padding=SAME,则滤波器的输出将是[1,32,32,8]。

The filter will first create an output for:

过滤器将首先创建一个输出:

F(00 01
  10 11)

And then for:

然后对于:

F(01 02
  11 12)

and so on. Then it will move to the second row, calculating:

等等。然后它将移动到第二行,计算:

F(10, 11
  20, 21)

then

然后

F(11, 12
  21, 22)

If you specify a stride of [1, 2, 2, 1] it won't do overlapping windows. It will compute:

如果您指定 [1, 2, 2, 1] 的步幅,则不会重叠窗口。它将计算:

F(00, 01
  10, 11)

and then

进而

F(02, 03
  12, 13)

The stride operates similarly for the pooling operators.

对于池化运算符,步幅的操作类似。

Question 2: Why strides [1, x, y, 1] for convnets

问题 2:为什么 strides [1, x, y, 1] for convnets

The first 1 is the batch: You don't usually want to skip over examples in your batch, or you shouldn't have included them in the first place. :)

第一个是批处理:您通常不想跳过批处理中的示例,或者您不应该首先包含它们。:)

The last 1 is the depth of the convolution: You don't usually want to skip inputs, for the same reason.

最后一个 1 是卷积的深度:出于同样的原因,您通常不想跳过输入。

The conv2d operator is more general, so you couldcreate convolutions that slide the window along other dimensions, but that's not a typical use in convnets. The typical use is to use them spatially.

conv2d 运算符更通用,因此您可以创建沿其他维度滑动窗口的卷积,但这不是卷积网络中的典型用途。典型的用途是在空间上使用它们。

Why reshape to -1-1 is a placeholder that says "adjust as necessary to match the size needed for the full tensor." It's a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code.

为什么 reshape 为 -1-1 是一个占位符,表示“根据需要调整以匹配完整张量所需的大小”。这是一种使代码独立于输入批大小的方法,这样您就可以更改管道而不必在代码中的任何地方调整批大小。

回答by Rafa? Józefowicz

The inputs are 4 dimensional and are of form: [batch_size, image_rows, image_cols, number_of_colors]

输入是 4 维的,其形式为: [batch_size, image_rows, image_cols, number_of_colors]

Strides, in general, define an overlap between applying operations. In the case of conv2d, it specifies what is the distance between consecutive applications of convolutional filters. The value of 1 in a specific dimension means that we apply the operator at every row/col, the value of 2 means every second, and so on.

一般来说,步幅定义了应用操作之间的重叠。在 conv2d 的情况下,它指定卷积滤波器的连续应用之间的距离是多少。特定维度中的值 1 表示我们在每一行/列应用运算符,值 2 表示每秒,依此类推。

Re 1)The values that matter for convolutions are 2nd and 3rd and they represent the overlap in the application of the convolutional filters along rows and columns. The value of [1, 2, 2, 1] says that we want to apply the filters on every second row and column.

Re 1)对卷积重要的值是 2nd 和 3rd,它们表示沿行和列应用卷积滤波器的重叠。[1, 2, 2, 1] 的值表示我们希望每隔一行和每列应用过滤器。

Re 2)I don't know the technical limitations (might be CuDNN requirement) but typically people use strides along the rows or columns dimensions. It doesn't necessarily make sense to do it over batch size. Not sure of the last dimension.

Re 2)我不知道技术限制(可能是 CuDNN 要求),但通常人们沿行或列维度使用步幅。超过批量大小不一定有意义。不确定最后一个维度。

Re 3)Setting -1 for one of the dimension means, "set the value for the first dimension so that the total number of elements in the tensor is unchanged". In our case, the -1 will be equal to the batch_size.

Re 3)为其中一个维度设置-1意味着“设置第一个维度的值,使张量中的元素总数不变”。在我们的例子中,-1 将等于 batch_size。

回答by Salvador Dali

Let's start with what stride does in 1-dim case.

让我们从 stride 在 1-dim 情况下的作用开始。

Let's assume your input = [1, 0, 2, 3, 0, 1, 1]and kernel = [2, 1, 3]the result of the convolution is [8, 11, 7, 9, 4], which is calculated by sliding your kernel over the input, performing element-wise multiplication and summing everything. Like this:

让我们假设你的input = [1, 0, 2, 3, 0, 1, 1]kernel = [2, 1, 3]卷积的结果是[8, 11, 7, 9, 4],它是通过在输入上滑动你的内核,执行元素乘法并对所有内容求和来计算的。像这样

  • 8 = 1 * 2 + 0 * 1 + 2 * 3
  • 11 = 0 * 2 + 2 * 1 + 3 * 3
  • 7 = 2 * 2 + 3 * 1 + 0 * 3
  • 9 = 3 * 2 + 0 * 1 + 1 * 3
  • 4 = 0 * 2 + 1 * 1 + 1 * 3
  • 8 = 1 * 2 + 0 * 1 + 2 * 3
  • 11 = 0 * 2 + 2 * 1 + 3 * 3
  • 7 = 2 * 2 + 3 * 1 + 0 * 3
  • 9 = 3 * 2 + 0 * 1 + 1 * 3
  • 4 = 0 * 2 + 1 * 1 + 1 * 3

Here we slide by one element, but nothing stops you by using any other number. This number is your stride. You can think about it as downsampling the result of the 1-strided convolution by just taking every s-th result.

在这里,我们滑动一个元素,但使用任何其他数字都不会阻止您。这个数字就是你的步幅。您可以将其视为通过仅获取第 s 个结果来对 1 步长卷积的结果进行下采样。

Knowing the input size i, kernel size k, stride sand padding pyou can easily calculate the output size of the convolution as:

知道输入大小i,内核大小k,步长s和填充p ,您可以轻松计算卷积的输出大小为:

enter image description here

在此处输入图片说明

Here || operator means ceiling operation. For a pooling layer s = 1.

这里 || operator 表示天花板操作。对于池化层 s = 1。



N-dim case.

N 暗盒。

Knowing the math for a 1-dim case, n-dim case is easy once you see that each dim is independent. So you just slide each dimension separately. Here is an example for 2-d. Notice that you do not need to have the same stride at all the dimensions. So for an N-dim input/kernel you should provide N strides.

了解 1-dim 情况和 n-dim 情况的数学原理,一旦您看到每个昏暗都是独立的,就很容易了。所以你只需分别滑动每个维度。这是2-d示例。请注意,您不需要在所有维度上都具有相同的步幅。因此,对于 N-dim 输入/内核,您应该提供 N 步。



So now it is easy to answer all your questions:

所以现在很容易回答你所有的问题:

  1. What do each of the 4+ integers represent?. conv2d, pooltells you that this list represents the strides among each dimension. Notice that the length of strides list is the same as the rank of kernel tensor.
  2. Why must they have strides[0] = strides3= 1 for convnets?. The first dimension is batch size, the last is channels. There is no point of skipping neither batch nor channel. So you make them 1. For width/height you can skip something and that's why they might be not 1.
  3. tf.reshape(_X,shape=[-1, 28, 28, 1]). Why -1?tf.reshapehas it covered for you:

    If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant. In particular, a shape of [-1] flattens into 1-D. At most one component of shape can be -1.

  1. 4+个整数分别代表什么?. conv2dpool告诉你这个列表代表每个维度之间的步幅。请注意,步幅列表的长度与内核张量的等级相同。
  2. 为什么他们必须有strides [0] = strides 3= 1 for convnets?. 第一个维度是批量大小,最后一个维度是通道。没有必要跳过批处理和通道。所以你把它们设为 1。对于宽度/高度,你可以跳过一些东西,这就是为什么它们可能不是 1。
  3. tf.reshape(_X,shape=[-1, 28, 28, 1])。为什么-1?tf.reshape已为您提供:

    如果形状的一个组成部分是特殊值 -1,则计算该维度的大小,以便总大小保持不变。特别是,[-1] 的形状变平为 1-D。最多一个形状分量可以是-1。

回答by rocksyne

@dga has done a wonderful job explaining and I can't be thankful enough how helpful it has been. In the like manner, I will like to share my findings on how strideworks in 3D convolution.

@dga 在解释方面做得很好,我非常感激它的帮助。以类似的方式,我想分享我对stride3D 卷积如何工作的发现。

According to the TensorFlow documentationon conv3d, the shape of the input must be in this order:

根据conv3d上的TensorFlow 文档,输入的形状必须按以下顺序排列:

[batch, in_depth, in_height, in_width, in_channels]

[batch, in_depth, in_height, in_width, in_channels]

Let's explain the variables from the extreme right to the left using an example. Assuming the input shape is input_shape = [1000,16,112,112,3]

让我们用一个例子来解释从最右边到左边的变量。假设输入形状是 input_shape = [1000,16,112,112,3]

input_shape[4] is the number of colour channels (RGB or whichever format it is extracted in)
input_shape[3] is the width of the image
input_shape[2] is the height of the image
input_shape[1] is the number of frames that have been lumped into 1 complete data
input_shape[0] is the number of lumped frames of images we have.

Below is a summary documentation for how stride is used.

以下是有关如何使用步幅的摘要文档。

strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1

strides:长度 >= 5 的整数列表。长度为 5 的一维张量。输入的每个维度的滑动窗口的步幅。一定有strides[0] = strides[4] = 1

As indicated in many works, strides simply mean how many steps away a window or kernel jumps away from the closest element, be it a data frame or pixel (this is paraphrased by the way).

正如在许多作品中所指出的,步幅仅表示窗口或内核从最近的元素跳出多少步,无论是数据帧还是像素(顺便解释一下)。

From the above documentation, a stride in 3D will look like this strides = (1,X,Y,Z,1).

从上述文档中,3D 中的步幅将如下所示:步幅 = (1, X, Y, Z,1)。

The documentation emphasizes that strides[0] = strides[4] = 1.

该文档强调strides[0] = strides[4] = 1.

strides[0]=1 means that we do not want to skip any data in the batch 
strides[4]=1 means that we do not want to skip in the channel 

strides[X] means how many skips we should make in the lumped frames. So for example, if we have 16 frames, X=1 means use every frame. X=2 means use every second frame and it goes and on

strides[X] 表示我们应该在集中帧中跳过多少次。例如,如果我们有 16 帧,X=1 表示使用每一帧。X=2 表示每隔一帧使用一次,然后继续

strides[y] and strides[z] follow the explanation by @dgaso I will not redo that part.

strides[y] 和strides[z] 遵循@dga的解释,所以我不会重做那部分。

In keras however, you only need to specify a tuple/list of 3 integers, specifying the strides of the convolution along each spatial dimension, where spatial dimension is stride[x], strides[y] and strides[z]. strides[0] and strides[4] is already defaulted to 1.

然而,在 keras 中,您只需要指定一个包含 3 个整数的元组/列表,指定沿着每个空间维度的卷积步长,其中空间维度是 s​​tride[x]、strides[y] 和 strides[z]。strides[0] 和 strides[4] 已经默认为 1。

I hope someone finds this helpful!

我希望有人觉得这有帮助!