pandas Python中Dataframe中每一行之间的余弦相似度

Question

提问by Jayanth Prakash Kulkarni

I have a DataFrame containing multiple vectors each having 3 entries. Each row is a vector in my representation. I needed to calculate the cosine similarity between each of these vectors. Converting this to a matrix representation is better or is there a cleaner approach in DataFrame itself?

我有一个包含多个向量的 DataFrame，每个向量有 3 个条目。在我的表示中，每一行都是一个向量。我需要计算每个向量之间的余弦相似度。将其转换为矩阵表示更好还是 DataFrame 本身有更简洁的方法？

Here is the code that I have tried.

这是我尝试过的代码。

import pandas as pd
from scipy import spatial
df = pd.DataFrame([X,Y,Z]).T
similarities = df.values.tolist()

for x in similarities:
    for y in similarities:
        result = 1 - spatial.distance.cosine(x, y)

Answer 1

回答by miradulo

You can directly just use sklearn.metrics.pairwise.cosine_similarity.

您可以直接使用sklearn.metrics.pairwise.cosine_similarity.

Demo

演示

import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

df = pd.DataFrame(np.random.randint(0, 2, (3, 5)))

df
##     0  1  2  3  4
##  0  1  1  1  0  0
##  1  0  0  1  1  1
##  2  0  1  0  1  0

cosine_similarity(df)
##  array([[ 1.        ,  0.33333333,  0.40824829],
##         [ 0.33333333,  1.        ,  0.40824829],
##         [ 0.40824829,  0.40824829,  1.        ]])

pandas Python中Dataframe中每一行之间的余弦相似度

提问by Jayanth Prakash Kulkarni

回答by miradulo

相关推荐

最近更新

标签

pandas Python中Dataframe中每一行之间的余弦相似度

提问by Jayanth Prakash Kulkarni

回答by miradulo

相关推荐

Python / Pandas - KeyError 合并数据帧

pandas 基于值的条形图的 Matplotlib 不同颜色

在 Pandas 数据框中检查 None

如何在 SQLAlchemy 的 `create_engine` 中使用 `charset` 和 `encoding`（创建 Pandas 数据框）？

相关推荐

最近更新

标签