Python 二进制交叉熵损失计算中 np.dot 和 np.multiply 与 np.sum 的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48201729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation
提问by Asad Shakeel
I have tried the following code but didn't find the difference between np.dotand np.multiply with np.sum
我尝试了以下代码,但没有发现np.dot和np.multiply 与 np.sum之间的区别
Here is np.dotcode
这是np.dot代码
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
Its output is
它的输出是
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
Here is the code for np.multiply with np.sum
这是np.multiply 与 np.sum的代码
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
Its output is
它的输出是
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
I'm unable to understand the type and shape difference whereas the result value is same in both cases
我无法理解类型和形状的差异,而两种情况下的结果值都相同
Even in the case of squeezing former code cost value become same as later but type remains same
即使在压缩前代码成本值的情况下也 变成与后相同但类型保持不变
cost = np.squeeze(cost)
print(type(cost))
print(cost)
output is
输出是
<class 'numpy.ndarray'>
0.6930587610394646
采纳答案by kmario23
What you're doing is calculating the binary cross-entropy losswhich measures how bad the predictions (here: A2
) of the model are when compared to the true outputs (here: Y
).
您正在做的是计算二进制交叉熵损失,该损失衡量A2
模型的预测(此处:)与真实输出(此处:)相比的糟糕程度Y
。
Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum
这是您的案例的可重现示例,它应该解释为什么在第二种情况下使用 np.sum
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
Note that the np.dot
sums along only the inner dimensionswhich match here (1x8) and (8x1)
. So, the 8
s will be gone during the dot product or matrix multiplication yielding the result as (1x1)
which is just a scalarbut returned as 2D array of shape (1,1)
.
请注意,仅在此处匹配的内部维度上的np.dot
总和。因此,s 将在点积或矩阵乘法过程中消失,产生的结果只是一个标量,但作为 2D 形状数组返回。(1x8) and (8x1)
8
(1x1)
(1,1)
Also, most importantly note that here np.dot
is exactly sameas doing np.matmul
since the inputs are 2D arrays (i.e. matrices)
此外,最重要的注意这里np.dot
是完全相同因为这样做np.matmul
,因为输入是二维数组(即矩阵)
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
Return result as a scalarvalue
将结果作为标量值返回
np.dot
or np.matmul
returns whatever the resulting array shape would be, based on input arrays. Even with out=
argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar()
on the result to convert it to a scalar if the result array is of shape (1,1)
(or more generally a scalarvalue wrapped in an nD array)
np.dot
或np.matmul
根据输入数组返回任何结果数组形状。如果输入是二维数组,即使有out=
参数也不可能返回scalar。但是,np.asscalar()
如果结果数组具有形状(1,1)
(或更一般地说是包裹在 nD 数组中的标量值),我们可以使用结果将其转换为标量
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
ndarrayof size 1 to scalarvalue
大小为 1 的ndarray到标量值
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
回答by Anuj Gautam
np.dot
is the dot productof two matrices.
np.dot
是两个矩阵的点积。
|A B| . |E F| = |A*E+B*G A*F+B*H|
|C D| |G H| |C*E+D*G C*F+D*H|
Whereas np.multiply
does an element-wise multiplicationof two matrices.
而np.multiply
确实的逐元素乘法两个矩阵。
|A B| ⊙ |E F| = |A*E B*F|
|C D| |G H| |C*G D*H|
When used with np.sum
, the result being equal is merely a coincidence.
与 一起使用时np.sum
,结果相等只是巧合。
>>> np.dot([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 5, 8],
[11, 18]])
>>> np.multiply([[1,2], [3,4]], [[1,2], [2,3]])
array([[ 1, 4],
[ 6, 12]])
>>> np.sum(np.dot([[1,2], [3,4]], [[1,2], [2,3]]))
42
>>> np.sum(np.multiply([[1,2], [3,4]], [[1,2], [2,3]]))
23
回答by hpaulj
If Y
and A2
are (1,N) arrays, then np.dot(Y,A.T)
will produce a (1,1) result. It is doing a matrix multiplication of a (1,N) with a (N,1). The N's
are summed, leaving the (1,1).
如果Y
和A2
是 (1,N) 数组,np.dot(Y,A.T)
则将产生 (1,1) 结果。它正在做 (1,N) 与 (N,1) 的矩阵乘法。该N's
相加,离开(1,1)。
With multiply
the result is (1,N). Sum all values, and the result is a scalar.
与multiply
结果为(1,N)。对所有值求和,结果是一个标量。
If Y
and A2
were (N,) shaped (same number of elements, but 1d), the np.dot(Y,A2)
(no .T
) would also produce a scalar. From np.dot
documentation:
如果Y
和A2
是 (N,) 形状(相同数量的元素,但为 1d),则np.dot(Y,A2)
(no .T
) 也会产生一个标量。从np.dot
文档:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors
Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned.
对于二维数组,它等效于矩阵乘法,对于一维数组等效于向量的内积
返回 a 和 b 的点积。如果 a 和 b 都是标量或都是一维数组,则返回一个标量;否则返回一个数组。
squeeze
reduces all size 1 dimensions, but still returns an array. In numpy
an array can have any number of dimensions (from 0 to 32). So a 0d array is possible. Compare the shape of np.array(3)
, np.array([3])
and np.array([[3]])
.
squeeze
减少所有大小为 1 的维度,但仍返回一个数组。在numpy
一个数组中可以有任意数量的维度(从 0 到 32)。所以 0d 数组是可能的。比较np.array(3)
,np.array([3])
和的形状np.array([[3]])
。
回答by Ashish S
In this example it just not a coincidence. Lets take an example we have two (1,3) and (1,3) matrices.
// Lets code
import numpy as np
x1=np.array([1, 2, 3]) // first array
x2=np.array([3, 4, 3]) // second array
//Then
X_Res=np.sum(np.multiply(x1,x2))
// will result 20 as it will be calculated as - (1*3)+(2*4)+(3*3) , i.e element wise
// multiplication followed by sum.
Y_Res=np.dot(x1,x2.T)
// in order to get (1,1) matrix) from a dot of (1,3) matrix and //(1,3) matrix we need to //transpose second one.
//Hence|1 2 3| * |3|
// |4| = |1*3+2*4+3*3| = |20|
// |3|
// will result 20 as it will be (1*3)+(2*4)+(3*3) , i.e. dot product of two matrices
print X_Res //20
print Y_Res //20