pandas Python中Dataframe中每一行之间的余弦相似度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45387476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cosine similarity between each row in a Dataframe in Python
提问by Jayanth Prakash Kulkarni
I have a DataFrame containing multiple vectors each having 3 entries. Each row is a vector in my representation. I needed to calculate the cosine similarity between each of these vectors. Converting this to a matrix representation is better or is there a cleaner approach in DataFrame itself?
我有一个包含多个向量的 DataFrame,每个向量有 3 个条目。在我的表示中,每一行都是一个向量。我需要计算每个向量之间的余弦相似度。将其转换为矩阵表示更好还是 DataFrame 本身有更简洁的方法?
Here is the code that I have tried.
这是我尝试过的代码。
import pandas as pd
from scipy import spatial
df = pd.DataFrame([X,Y,Z]).T
similarities = df.values.tolist()
for x in similarities:
for y in similarities:
result = 1 - spatial.distance.cosine(x, y)
回答by miradulo
You can directly just use sklearn.metrics.pairwise.cosine_similarity
.
您可以直接使用sklearn.metrics.pairwise.cosine_similarity
.
Demo
演示
import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
df = pd.DataFrame(np.random.randint(0, 2, (3, 5)))
df
## 0 1 2 3 4
## 0 1 1 1 0 0
## 1 0 0 1 1 1
## 2 0 1 0 1 0
cosine_similarity(df)
## array([[ 1. , 0.33333333, 0.40824829],
## [ 0.33333333, 1. , 0.40824829],
## [ 0.40824829, 0.40824829, 1. ]])