pandas 如何在Python中找到两个矩阵之间的差异，结果不应该有任何带减号的值

Question

提问by JKC

I have a Panda Dataframe with two columns (Word and Word_Position) in it. I need to find the distance between words and present the output in matrix form for better readability.

我有一个包含两列（Word 和 Word_Position）的 Panda Dataframe。我需要找到单词之间的距离并以矩阵形式呈现输出以获得更好的可读性。

What I have done so far is I have created a row matrix from the DF.Word_Position column and transposed it to create a column matrix. When I subtracted both these matrices, I am getting few values with minus sign before them.

到目前为止，我所做的是从 DF.Word_Position 列创建了一个行矩阵并将其转置以创建一个列矩阵。当我减去这两个矩阵时，我得到的值很少带有减号。

With all due respect to the great mathematics, this is absolutely correct but for my requirement I just need the number and not the minus sign.

恕我直言，这绝对是正确的，但对于我的要求，我只需要数字而不是减号。

Is there any other better way to do the same ? Appreciating your help. Thanks in advance.

有没有其他更好的方法来做同样的事情？感谢您的帮助。提前致谢。

Note : I am using Python 3.6

注意：我使用的是 Python 3.6

Code snippets and its corresponding output for your reference

代码片段及其相应的输出供您参考

m1 = np.matrix(df1['Word Position'])
print(m1)
[[ 1  2  3 ..., 19 20 21]]

m2 = np.matrix(m1.T)
print(m2)
[[ 1]
 [ 2]
 [ 3]
 ..., 
 [19]
 [20]
 [21]]

print(m2-m1)
[[  0  -1  -2 ..., -18 -19 -20]
 [  1   0  -1 ..., -17 -18 -19]
 [  2   1   0 ..., -16 -17 -18]
 ..., 
 [ 18  17  16 ...,   0  -1  -2]
 [ 19  18  17 ...,   1   0  -1]
 [ 20  19  18 ...,   2   1   0]]

Answer 1

采纳答案by Alexander

Just take the absolute value?

只取绝对值？

np.abs(m2 - m1)

Your code indicates that your data consists of numpy arrays, so the solution above should work.

您的代码表明您的数据由 numpy 数组组成，因此上述解决方案应该有效。

If they are dataframes, you could do:

如果它们是数据框，你可以这样做：

m2.sub(m1).abs()

Answer 2

回答by Daniel F

In this case, you probably want to use scipy.spatial.distance.pdist

在这种情况下，您可能想要使用 scipy.spatial.distance.pdist

from scipy.spatial.distance import squareform, pdist
m = df1['Word Position'].data[:, None]
dist = squareform(pdist(m, 'minkowksi', 1))

A bit overkill for this, but extensible if you ever want to change your distance parameter, and usually faster than broadcasting (since it only does half the subtraction steps as abs(a-b) == abs(b-a)). If you want to do broadcasting you could always do this:

对此有点矫枉过正，但如果您想更改距离参数，则可以扩展，并且通常比广播快（因为它只执行减法步骤的一半abs(a-b) == abs(b-a)）。如果你想做广播，你可以这样做：

dist = np.abs(m - m.T)

Answer 3

回答by Y0da

If you want the distance between to arrays, the proper way is to compute the norm:

如果您想要数组之间的距离，正确的方法是计算norm：

dists = [np.linalg.norm(m - m2, axis=1) for m in m1[0]]

This assume that shape of the arrays are (n_sample, n_dimension).
Instead of list comprehension, you can do numpy broadcasting on m2

这假设数组的形状是(n_sample, n_dimension)。
您可以在 m2 上进行 numpy 广播，而不是列表理解

I you want more control on the metric you might want to use scipy.spatial.distance.cdist. This option is faster with large arrays. An example with the minkowski distance (p=2 for Euclidean distance):

我想要更多地控制你可能想要使用的指标scipy.spatial.distance.cdist。对于大型阵列，此选项更快。minkowski 距离的示例（欧几里得距离 p=2）：

dists = [scipy.spatial.distance.cdist(m, m2, 'minkowski', p) for m in m1]

Of course, if the array is only 1D you can achieve that using an absolute value:

当然，如果数组只有一维，您可以使用绝对值来实现：

dists = np.abs(m1 - m2)

pandas 如何在Python中找到两个矩阵之间的差异，结果不应该有任何带减号的值

提问by JKC

采纳答案by Alexander

回答by Daniel F

回答by Y0da

相关推荐

最近更新

标签

pandas 如何在Python中找到两个矩阵之间的差异，结果不应该有任何带减号的值

提问by JKC

采纳答案by Alexander

回答by Daniel F

回答by Y0da

相关推荐

从列表中更改 Pandas Dataframe 中的列名

pandas pd.read_csv 给了我 str 但需要浮动

将 Pandas 数据帧写入 xlsx 文件时出现权限错误

访问 Pandas 数据框中的第一列

相关推荐

最近更新

标签