pandas 如何在Python中找到两个矩阵之间的差异,结果不应该有任何带减号的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45583472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:12:52  来源:igfitidea点击:

How to find the difference between two matrices in Python with the result should not have any values with minus sign

pythonpython-3.xpandasnumpymatrix

提问by JKC

I have a Panda Dataframe with two columns (Word and Word_Position) in it. I need to find the distance between words and present the output in matrix form for better readability.

我有一个包含两列(Word 和 Word_Position)的 Panda Dataframe。我需要找到单词之间的距离并以矩阵形式呈现输出以获得更好的可读性。

What I have done so far is I have created a row matrix from the DF.Word_Position column and transposed it to create a column matrix. When I subtracted both these matrices, I am getting few values with minus sign before them.

到目前为止,我所做的是从 DF.Word_Position 列创建了一个行矩阵并将其转置以创建一个列矩阵。当我减去这两个矩阵时,我得到的值很少带有减号。

With all due respect to the great mathematics, this is absolutely correct but for my requirement I just need the number and not the minus sign.

恕我直言,这绝对是正确的,但对于我的要求,我只需要数字而不是减号。

Is there any other better way to do the same ? Appreciating your help. Thanks in advance.

有没有其他更好的方法来做同样的事情?感谢您的帮助。提前致谢。

Note : I am using Python 3.6

注意:我使用的是 Python 3.6

Code snippets and its corresponding output for your reference

代码片段及其相应的输出供您参考

m1 = np.matrix(df1['Word Position'])
print(m1)
[[ 1  2  3 ..., 19 20 21]]

m2 = np.matrix(m1.T)
print(m2)
[[ 1]
 [ 2]
 [ 3]
 ..., 
 [19]
 [20]
 [21]]

print(m2-m1)
[[  0  -1  -2 ..., -18 -19 -20]
 [  1   0  -1 ..., -17 -18 -19]
 [  2   1   0 ..., -16 -17 -18]
 ..., 
 [ 18  17  16 ...,   0  -1  -2]
 [ 19  18  17 ...,   1   0  -1]
 [ 20  19  18 ...,   2   1   0]]

采纳答案by Alexander

Just take the absolute value?

只取绝对值?

np.abs(m2 - m1)

Your code indicates that your data consists of numpy arrays, so the solution above should work.

您的代码表明您的数据由 numpy 数组组成,因此上述解决方案应该有效。

If they are dataframes, you could do:

如果它们是数据框,你可以这样做:

m2.sub(m1).abs()

回答by Daniel F

In this case, you probably want to use scipy.spatial.distance.pdist

在这种情况下,您可能想要使用 scipy.spatial.distance.pdist

from scipy.spatial.distance import squareform, pdist
m = df1['Word Position'].data[:, None]
dist = squareform(pdist(m, 'minkowksi', 1))

A bit overkill for this, but extensible if you ever want to change your distance parameter, and usually faster than broadcasting (since it only does half the subtraction steps as abs(a-b) == abs(b-a)). If you want to do broadcasting you could always do this:

对此有点矫枉过正,但如果您想更改距离参数,则可以扩展,并且通常比广播快(因为它只执行减法步骤的一半abs(a-b) == abs(b-a))。如果你想做广播,你可以这样做:

dist = np.abs(m - m.T)

回答by Y0da

If you want the distance between to arrays, the proper way is to compute the norm:

如果您想要数组之间的距离,正确的方法是计算norm

dists = [np.linalg.norm(m - m2, axis=1) for m in m1[0]]

This assume that shape of the arrays are (n_sample, n_dimension).

Instead of list comprehension, you can do numpy broadcasting on m2

这假设数组的形状是(n_sample, n_dimension)

您可以在 m2 上进行 numpy 广播,而不是列表理解



I you want more control on the metric you might want to use scipy.spatial.distance.cdist. This option is faster with large arrays. An example with the minkowski distance (p=2 for Euclidean distance):

我想要更多地控制你可能想要使用的指标scipy.spatial.distance.cdist。对于大型阵列,此选项更快。minkowski 距离的示例(欧几里得距离 p=2):

dists = [scipy.spatial.distance.cdist(m, m2, 'minkowski', p) for m in m1]

Of course, if the array is only 1D you can achieve that using an absolute value:

当然,如果数组只有一维,您可以使用绝对值来实现:

dists = np.abs(m1 - m2)