Python Sklearn kNN 使用与用户定义的度量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21052509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:55:24  来源:igfitidea点击:

Sklearn kNN usage with a user defined metric

pythonknn

提问by user2926523

Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I have to use the user defined metric, from the documents of sklearn, which can be find hereand here.

目前我正在做一个项目,它可能需要使用 kNN 算法来找到给定点的前 k 个最近邻,比如 P。im 使用 python、sklearn 包来完成这项工作,但我们的预定义指标不是这些默认指标之一指标。所以我必须使用用户定义的指标,来自 sklearn 的文档,可以在这里这里找到。

It seems that the latest version of sklearn kNN support the user defined metric, but i cant find how to use it:

似乎最新版本的 sklearn kNN 支持用户定义的指标,但我找不到如何使用它:

import sklearn
from sklearn.neighbors import NearestNeighbors
import numpy as np
from sklearn.neighbors import DistanceMetric
from sklearn.neighbors.ball_tree import BallTree
BallTree.valid_metrics

say i have defined a metric called mydist=max(x-y), then use DistanceMetric.get_metric to make it a DistanceMetric object:

假设我定义了一个名为 mydist=max(xy) 的度量,然后使用 DistanceMetric.get_metric 使其成为一个 DistanceMetric 对象:

dt=DistanceMetric.get_metric('pyfunc',func=mydist)

from the document, the line should looks like this

从文档中,该行应如下所示

nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A)
distances, indices = nbrs.kneighbors(A)

but where can i put the dtin? Thanks

但是我可以把它dt放在哪里?谢谢

采纳答案by alko

You pass a metric as metricparam, and additional metric arguments as keyword paramethers to NN constructor:

您将度量作为metric参数传递,并将其他度量参数作为关键字参数传递给 NN 构造函数:

>>> def mydist(x, y):
...     return np.sum((x-y)**2)
...
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

>>> nbrs = NearestNeighbors(n_neighbors=4, algorithm='ball_tree',
...            metric='pyfunc', func=mydist)
>>> nbrs.fit(X)
NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='pyfunc',
         n_neighbors=4, radius=1.0)
>>> nbrs.kneighbors(X)
(array([[  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.],
       [  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.]]), array([[0, 1, 2, 3],
       [1, 0, 2, 3],
       [2, 1, 0, 3],
       [3, 4, 5, 0],
       [4, 3, 5, 0],
       [5, 4, 3, 0]]))

回答by Mahmoud

A small addition to the previous answer. How to use a user defined metric that takes additional arguments.

对上一个答案的一个小补充。如何使用用户定义的带有附加参数的指标。

>>> def mydist(x, y, **kwargs):
...     return np.sum((x-y)**kwargs["metric_params"]["power"])
...
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([-1, -1, -2, 1, 1, 2])
>>> nbrs = KNeighborsClassifier(n_neighbors=4, algorithm='ball_tree',
...            metric=mydist, metric_params={"power": 2})
>>> nbrs.fit(X, Y)
KNeighborsClassifier(algorithm='ball_tree', leaf_size=30,                                                                                                                                                          
       metric=<function mydist at 0x7fd259c9cf50>, n_neighbors=4, p=2,
       weights='uniform')
>>> nbrs.kneighbors(X)
(array([[  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.],
       [  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.]]),
 array([[0, 1, 2, 3],
       [1, 0, 2, 3],
       [2, 1, 0, 3],
       [3, 4, 5, 0],
       [4, 3, 5, 0],
       [5, 4, 3, 0]]))