Python Tensorflow - 输入矩阵与批处理数据的 matmul
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38235555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Tensorflow - matmul of input matrix with batch data
提问by yoki
I have some data represented by input_x
. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n
. input_x
undergoes tf.nn.embedding_lookup
, so that embed
now has dimensions [?, n, m]
where m
is the embedding size and ?
refers to the unknown batch size.
我有一些由input_x
. 它是一个未知大小的张量(应按批次输入),其中的每个项目的大小为n
。input_x
经历tf.nn.embedding_lookup
,所以embed
现在有维度[?, n, m]
其中m
是嵌入大小并?
指代未知的批量大小。
This is described here:
这是在此处描述的:
input_x = tf.placeholder(tf.int32, [None, n], name="input_x")
embed = tf.nn.embedding_lookup(W, input_x)
I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U
, and I can't seem to get how to do that.
我现在试图将输入数据中的每个样本(现在通过嵌入维度扩展)乘以矩阵变量 ,U
但我似乎不知道如何做到这一点。
I first tried using tf.matmul
but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U
and applying batch_matmul
(I also tried the function from tf.nn.math_ops.
, the result was the same):
我第一次尝试使用,tf.matmul
但由于形状不匹配而出现错误。然后我尝试了以下方法,通过扩展U
和应用的维度batch_matmul
(我也尝试了来自 的函数tf.nn.math_ops.
,结果是一样的):
U = tf.Variable( ... )
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)
This passes the initial compilation, but then when actual data is applied, I get the following error:
这通过了初始编译,但是当应用实际数据时,我收到以下错误:
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
I also know why this is happening - I replicated the dimension of U
and it is now 1
, but the minibatch size, 64
, doesn't fit.
我也知道为什么会发生这种情况 - 我复制了 的尺寸,U
现在是1
,但小批量大小64
不适合。
How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?
我怎样才能正确地对我的张量矩阵输入进行矩阵乘法(对于未知的批量大小)?
采纳答案by Styrke
The matmul operationonly works on matrices (2D tensors). Here are two main approaches to do this, both assume that U
is a 2D tensor.
该MATMUL操作仅适用于矩阵(二维张量)。这里有两种主要的方法来做到这一点,都假设它U
是一个 2D 张量。
Slice
embed
into 2D tensors and multiply each of them withU
individually. This is probably easiest to do usingtf.scan()
like this:h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
On the other hand if efficiency is important it may be better to reshape
embed
to be a 2D tensor so the multiplication can be done with a singlematmul
like this:embed = tf.reshape(embed, [-1, m]) h = tf.matmul(embed, U) h = tf.reshape(h, [-1, n, c])
where
c
is the number of columns inU
. The last reshape will make sure thath
is a 3D tensor where the 0th dimension corresponds to the batch just like the originalx_input
andembed
.
切片
embed
成 2D 张量并将它们中的每一个U
单独相乘。这可能是最容易使用的方法tf.scan()
:h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
另一方面,如果效率很重要,最好将其重塑
embed
为 2D 张量,这样乘法就可以用这样的单个来完成matmul
:embed = tf.reshape(embed, [-1, m]) h = tf.matmul(embed, U) h = tf.reshape(h, [-1, n, c])
中
c
的列数在哪里U
。最后一次重塑将确保它h
是一个 3D 张量,其中第 0 维对应于批次,就像原始x_input
和embed
.
回答by Salvador Dali
Previous answers are obsolete. Currently tf.matmul()
support tensors with rank > 2:
以前的答案已过时。目前tf.matmul()
支持 rank > 2 的张量:
The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.
输入必须是矩阵(或秩 > 2 的张量,表示矩阵批次),具有匹配的内部维度,可能在转置之后。
Also tf.batch_matmul()
was removed and tf.matmul()
is the right way to do batch multiplication. The main idea can be understood from the following code:
也tf.batch_matmul()
被删除,tf.matmul()
是进行批量乘法的正确方法。主要思想可以从以下代码中理解:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k)
. Here is what is going on here. Assume you have batch_size
of matrices nxm
and batch_size
of matrices mxk
. Now for each pair of them you calculate nxm X mxk
which gives you an nxk
matrix. You will have batch_size
of them.
现在您将收到形状的张量(batch_size, n, k)
。这是这里发生的事情。假设你有batch_size
矩阵nxm
和batch_size
矩阵mxk
。现在,对于每对它们,您计算nxm X mxk
得出一个nxk
矩阵。你将拥有batch_size
它们。
Notice that something like this is also valid:
请注意,这样的事情也是有效的:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
并且会给你一个形状 (a, b, n, k)
回答by P-Gn
1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise
1.我想将一批矩阵与一批相同长度的矩阵相乘,成对
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))
# python >= 3.5
MN = M @ N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)
2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise
2.我想将一批矩阵与一批相同长度的向量相乘,成对
We fall back to case 1 by adding and removing a dimension to v
.
我们通过向 中添加和删除维度来回退到情况 1 v
。
M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))
Mv = (M @ v[..., None])[..., 0]
# Mv has shape (batch_size, n)
3. I want to multiply a single matrix with a batch of matrices
3.我想将单个矩阵与一批矩阵相乘
In this case, we cannot simply add a batch dimension of 1
to the single matrix, because tf.matmul
does not broadcast in the batch dimension.
在这种情况下,我们不能简单地1
向单个矩阵添加 的批次维度,因为tf.matmul
不会在批次维度中进行广播。
3.1. The single matrix is on the right side
3.1. 单个矩阵在右侧
In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.
在这种情况下,我们可以使用简单的 reshape 将矩阵批次视为单个大矩阵。
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))
MN = tf.reshape(tf.reshape(M, [-1, m]) @ N, [-1, n, p])
# MN has shape (batch_size, n, p)
3.2. The single matrix is on the left side
3.2. 单个矩阵在左侧
This case is more complicated. We can fall back to case 3.1 by transposing the matrices.
这个案子比较复杂。我们可以通过转置矩阵回到案例 3.1。
MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) @ MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)
However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M
to match the batch dimension:
然而,转置可能是一项代价高昂的操作,这里对整批矩阵进行两次。简单地复制M
以匹配批次维度可能会更好:
MN = tf.tile(M[None], [batch_size, 1, 1]) @ N
Profiling will tell which option works better for a given problem/hardware combination.
分析将告诉哪个选项更适合给定的问题/硬件组合。
4. I want to multiply a single matrix with a batch of vectors
4. 我想将单个矩阵与一批向量相乘
This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with
这看起来类似于情况 3.2,因为单个矩阵在左侧,但实际上更简单,因为转置向量本质上是一个空操作。我们最终得到
M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))
MT = tf.matrix_transpose(M)
Mv = v @ MT
What about einsum
?
怎么样einsum
?
All of the previous multiplications could have been written with the tf.einsum
swiss army knife. For example the first solution for 3.2 could be written simply as
之前的所有乘法都可以用tf.einsum
瑞士军刀写出来。例如,3.2 的第一个解决方案可以简单地写成
MN = tf.einsum('nm,bmp->bnp', M, N)
However, note that einsum
is ultimately relying on tranpose
and matmul
for the computation.
但是,请注意,einsum
最终依赖于tranpose
和matmul
为计算。
So even though einsum
is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum
expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.
因此,尽管einsum
编写矩阵乘法是一种非常方便的方法,但它隐藏了底层操作的复杂性——例如,猜测一个einsum
表达式将转置数据的次数并不简单,因此该操作的成本有多大。此外,它可能隐藏了一个事实,即同一个操作可能有多个替代方案(参见案例 3.2),并且不一定选择更好的选项。
For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum
syntax, then by all means go for it.
出于这个原因,我个人会使用像上面那样的显式公式来更好地传达它们各自的复杂性。尽管如果您知道自己在做什么并且喜欢einsum
语法的简单性,那么请务必去做。
回答by Desh Raj
As answered by @Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping
正如@Stryke 所回答的,有两种方法可以实现这一点:1. 扫描,以及 2. 整形
tf.scanrequires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
A' = tf.reshape(Aijk,[i*j,k]) C' = tf.matmul(A',Bkl) C = tf.reshape(C',[i,j,l])
tf.scan需要 lambda 函数,通常用于递归操作。一些相同的例子在这里:https: //rdipietro.github.io/tensorflow-scan-examples/
我个人更喜欢重塑,因为它更直观。如果您尝试将 3D 张量中的每个矩阵乘以 2D 张量矩阵,例如 Cijl = Aijk * Bkl,您可以通过简单的重塑来完成。
A' = tf.reshape(Aijk,[i*j,k]) C' = tf.matmul(A',Bkl) C = tf.reshape(C',[i,j,l])
回答by James Fletcher
It seems that in TensorFlow 1.11.0 the docsfor tf.matmul
incorrectly say that it works for rank >= 2.
看来,在TensorFlow 1.11.0的文档的tf.matmul
错误说,它适用于等级> = 2。
Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0))
(docs).
相反,我发现的最好的干净替代方法是使用tf.tensordot(a, b, (-1, 0))
( docs)。
This function gets the dot product of any axis of array a
and any axis of array b
in its general form tf.tensordot(a, b, axis)
. Providing axis
as (-1, 0)
gets the standard dot product of two arrays.
该函数获得阵列的任何轴线的点积a
和阵列的任何轴b
在其一般形式tf.tensordot(a, b, axis)
。提供axis
as(-1, 0)
获取两个数组的标准点积。