Python pandas:查找两列的余弦相似度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25736861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:26:33  来源:igfitidea点击:

Python pandas: Finding cosine similarity of two columns

pythonpandasdataframecosine-similarity

提问by hlin117

Suppose I have two columns in a python pandas.DataFrame:

假设我在 python pandas.DataFrame 中有两列:

          col1 col2
item_1    158  173
item_2     25  191
item_3    180   33
item_4    152  165
item_5     96  108

What's the best way to take the cosine similarity of these two columns?

获取这两列的余弦相似度的最佳方法是什么?

回答by xbello

Is that what you're looking for?

这就是你要找的吗?

from scipy.spatial.distance import cosine
from pandas import DataFrame


df = DataFrame({"col1": [158, 25, 180, 152, 96],
                "col2": [173, 191, 33, 165, 108]})

print(1 - cosine(df["col1"], df["col2"]))

回答by Amir Imani

You can also use cosine_similarityor other similarity metrics from sklearn.metrics.pairwise.

您还可以使用sklearn.metrics.pairwise 中的cosine_similarity或 其他相似性指标。

from sklearn.metrics.pairwise import cosine_similarity

cosine_similarity(df.col1, df.col2)
Out[4]: array([[0.7498213]])