Python Tensorflow - 输入矩阵与批处理数据的 matmul

Question

提问by yoki

I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_xundergoes tf.nn.embedding_lookup, so that embednow has dimensions [?, n, m]where mis the embedding size and ?refers to the unknown batch size.

我有一些由input_x. 它是一个未知大小的张量（应按批次输入），其中的每个项目的大小为n。input_x经历tf.nn.embedding_lookup，所以embed现在有维度[?, n, m]其中m是嵌入大小并?指代未知的批量大小。

This is described here:

这是在此处描述的：

input_x = tf.placeholder(tf.int32, [None, n], name="input_x") 
embed = tf.nn.embedding_lookup(W, input_x)

I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.

我现在试图将输入数据中的每个样本（现在通过嵌入维度扩展）乘以矩阵变量，U但我似乎不知道如何做到这一点。

I first tried using tf.matmulbut it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of Uand applying batch_matmul(I also tried the function from tf.nn.math_ops., the result was the same):

我第一次尝试使用，tf.matmul但由于形状不匹配而出现错误。然后我尝试了以下方法，通过扩展U和应用的维度batch_matmul（我也尝试了来自的函数tf.nn.math_ops.，结果是一样的）：

U = tf.Variable( ... )    
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)

This passes the initial compilation, but then when actual data is applied, I get the following error:

这通过了初始编译，但是当应用实际数据时，我收到以下错误：

In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]

I also know why this is happening - I replicated the dimension of Uand it is now 1, but the minibatch size, 64, doesn't fit.

我也知道为什么会发生这种情况 - 我复制了的尺寸，U现在是1，但小批量大小64不适合。

How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?

我怎样才能正确地对我的张量矩阵输入进行矩阵乘法（对于未知的批量大小）？

Answer 1

采纳答案by Styrke

The matmul operationonly works on matrices (2D tensors). Here are two main approaches to do this, both assume that Uis a 2D tensor.

该MATMUL操作仅适用于矩阵（二维张量）。这里有两种主要的方法来做到这一点，都假设它U是一个 2D 张量。

Slice embedinto 2D tensors and multiply each of them with Uindividually. This is probably easiest to do using tf.scan()like this:
```
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
```
On the other hand if efficiency is important it may be better to reshape embedto be a 2D tensor so the multiplication can be done with a single matmullike this:
```
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
```
where cis the number of columns in U. The last reshape will make sure that his a 3D tensor where the 0th dimension corresponds to the batch just like the original x_inputand embed.

切片embed成 2D 张量并将它们中的每一个U单独相乘。这可能是最容易使用的方法tf.scan()：
```
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
```
另一方面，如果效率很重要，最好将其重塑embed为 2D 张量，这样乘法就可以用这样的单个来完成matmul：
```
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
```
中c的列数在哪里U。最后一次重塑将确保它h是一个 3D 张量，其中第 0 维对应于批次，就像原始x_input和embed.

Answer 2

回答by Salvador Dali

Previous answers are obsolete. Currently tf.matmul()support tensors with rank > 2:

以前的答案已过时。目前tf.matmul()支持 rank > 2 的张量：

The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.

输入必须是矩阵（或秩 > 2 的张量，表示矩阵批次），具有匹配的内部维度，可能在转置之后。

Also tf.batch_matmul()was removed and tf.matmul()is the right way to do batch multiplication. The main idea can be understood from the following code:

也tf.batch_matmul()被删除，tf.matmul()是进行批量乘法的正确方法。主要思想可以从以下代码中理解：

import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)

Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_sizeof matrices nxmand batch_sizeof matrices mxk. Now for each pair of them you calculate nxm X mxkwhich gives you an nxkmatrix. You will have batch_sizeof them.

现在您将收到形状的张量(batch_size, n, k)。这是这里发生的事情。假设你有batch_size矩阵nxm和batch_size矩阵mxk。现在，对于每对它们，您计算nxm X mxk得出一个nxk矩阵。你将拥有batch_size它们。

Notice that something like this is also valid:

请注意，这样的事情也是有效的：

A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)

and will give you a shape (a, b, n, k)

并且会给你一个形状 (a, b, n, k)

Answer 3

回答by P-Gn

1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise

1.我想将一批矩阵与一批相同长度的矩阵相乘，成对

M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))

# python >= 3.5
MN = M @ N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)

2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise

2.我想将一批矩阵与一批相同长度的向量相乘，成对

We fall back to case 1 by adding and removing a dimension to v.

我们通过向中添加和删除维度来回退到情况 1 v。

M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))

Mv = (M @ v[..., None])[..., 0]
# Mv has shape (batch_size, n)

3. I want to multiply a single matrix with a batch of matrices

3.我想将单个矩阵与一批矩阵相乘

In this case, we cannot simply add a batch dimension of 1to the single matrix, because tf.matmuldoes not broadcast in the batch dimension.

在这种情况下，我们不能简单地1向单个矩阵添加的批次维度，因为tf.matmul不会在批次维度中进行广播。

3.1. The single matrix is on the right side

3.1. 单个矩阵在右侧

In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.

在这种情况下，我们可以使用简单的 reshape 将矩阵批次视为单个大矩阵。

M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))

MN = tf.reshape(tf.reshape(M, [-1, m]) @ N, [-1, n, p])
# MN has shape (batch_size, n, p)

3.2. The single matrix is on the left side

3.2. 单个矩阵在左侧

This case is more complicated. We can fall back to case 3.1 by transposing the matrices.

这个案子比较复杂。我们可以通过转置矩阵回到案例 3.1。

MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) @ MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)

However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate Mto match the batch dimension:

然而，转置可能是一项代价高昂的操作，这里对整批矩阵进行两次。简单地复制M以匹配批次维度可能会更好：

MN = tf.tile(M[None], [batch_size, 1, 1]) @ N

Profiling will tell which option works better for a given problem/hardware combination.

分析将告诉哪个选项更适合给定的问题/硬件组合。

4. I want to multiply a single matrix with a batch of vectors

4. 我想将单个矩阵与一批向量相乘

This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with

这看起来类似于情况 3.2，因为单个矩阵在左侧，但实际上更简单，因为转置向量本质上是一个空操作。我们最终得到

M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))

MT = tf.matrix_transpose(M)
Mv = v @ MT

What about `einsum`?

怎么样`einsum`？

All of the previous multiplications could have been written with the tf.einsumswiss army knife. For example the first solution for 3.2 could be written simply as

之前的所有乘法都可以用tf.einsum瑞士军刀写出来。例如，3.2 的第一个解决方案可以简单地写成

MN = tf.einsum('nm,bmp->bnp', M, N)

However, note that einsumis ultimately relying on tranposeand matmulfor the computation.

但是，请注意，einsum最终依赖于tranpose和matmul为计算。

So even though einsumis a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsumexpression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.

因此，尽管einsum编写矩阵乘法是一种非常方便的方法，但它隐藏了底层操作的复杂性——例如，猜测一个einsum表达式将转置数据的次数并不简单，因此该操作的成本有多大。此外，它可能隐藏了一个事实，即同一个操作可能有多个替代方案（参见案例 3.2），并且不一定选择更好的选项。

For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsumsyntax, then by all means go for it.

出于这个原因，我个人会使用像上面那样的显式公式来更好地传达它们各自的复杂性。尽管如果您知道自己在做什么并且喜欢einsum语法的简单性，那么请务必去做。

Answer 4

回答by Desh Raj

As answered by @Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping

正如@Stryke 所回答的，有两种方法可以实现这一点：1. 扫描，以及 2. 整形

tf.scanrequires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
```
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])
```

tf.scan需要 lambda 函数，通常用于递归操作。一些相同的例子在这里：https: //rdipietro.github.io/tensorflow-scan-examples/
我个人更喜欢重塑，因为它更直观。如果您尝试将 3D 张量中的每个矩阵乘以 2D 张量矩阵，例如 Cijl = Aijk * Bkl，您可以通过简单的重塑来完成。
```
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])
```

Answer 5

回答by James Fletcher

It seems that in TensorFlow 1.11.0 the docsfor tf.matmulincorrectly say that it works for rank >= 2.

看来，在TensorFlow 1.11.0的文档的tf.matmul错误说，它适用于等级> = 2。

Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0))(docs).

相反，我发现的最好的干净替代方法是使用tf.tensordot(a, b, (-1, 0))( docs)。

This function gets the dot product of any axis of array aand any axis of array bin its general form tf.tensordot(a, b, axis). Providing axisas (-1, 0)gets the standard dot product of two arrays.

该函数获得阵列的任何轴线的点积a和阵列的任何轴b在其一般形式tf.tensordot(a, b, axis)。提供axisas(-1, 0)获取两个数组的标准点积。

Python Tensorflow - 输入矩阵与批处理数据的 matmul

提问by yoki

采纳答案by Styrke

回答by Salvador Dali

回答by P-Gn

1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise

1.我想将一批矩阵与一批相同长度的矩阵相乘，成对

2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise

2.我想将一批矩阵与一批相同长度的向量相乘，成对

3. I want to multiply a single matrix with a batch of matrices

3.我想将单个矩阵与一批矩阵相乘

3.1. The single matrix is on the right side

3.1. 单个矩阵在右侧

3.2. The single matrix is on the left side

3.2. 单个矩阵在左侧

4. I want to multiply a single matrix with a batch of vectors

4. 我想将单个矩阵与一批向量相乘

What about `einsum`?

怎么样`einsum`？

回答by Desh Raj

回答by James Fletcher

相关推荐

最近更新

标签

Python Tensorflow - 输入矩阵与批处理数据的 matmul

提问by yoki

采纳答案by Styrke

回答by Salvador Dali

回答by P-Gn

1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise

1.我想将一批矩阵与一批相同长度的矩阵相乘，成对

2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise

2.我想将一批矩阵与一批相同长度的向量相乘，成对

3. I want to multiply a single matrix with a batch of matrices

3.我想将单个矩阵与一批矩阵相乘

3.1. The single matrix is on the right side

3.1. 单个矩阵在右侧

3.2. The single matrix is on the left side

3.2. 单个矩阵在左侧

4. I want to multiply a single matrix with a batch of vectors

4. 我想将单个矩阵与一批向量相乘

What about einsum?

怎么样einsum？

回答by Desh Raj

回答by James Fletcher

相关推荐

Python：尝试从导入的包中导入模块时出现“ModuleNotFoundError”

Python Jupyter 快捷方式不起作用

如何在 Ubuntu 18 上为 python 3.7 安装 pip？

Python matlab 数据文件到 Pandas DataFrame

相关推荐

最近更新

标签

What about `einsum`?

怎么样`einsum`？