pandas ValueError: 预期 n_neighbors <= 1. Got 5 -Scikit K 最近分类器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29999297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:18:13  来源:igfitidea点击:

ValueError: Expected n_neighbors <= 1. Got 5 -Scikit K Nearest Classifier

pythonnumpypandasmachine-learningscikit-learn

提问by user2757902

I'm using SCIkit KNN and levenstein distance to some work on strings, much like this example at the bottom of this page: http://scikit-learn.org/stable/faq.html. The difference being my data is split into training sets and is in a dataframe.

我正在使用 SCIkit KNN 和 levenstein 距离来处理字符串,就像本页底部的这个例子:http://scikit-learn.org/stable/faq.html 。不同之处在于我的数据被分成训练集并在一个数据框中。

The split is listed here:

此处列出了拆分:

train_feature, test_feature, train_class, test_class = train_test_split(features, classes,
                                                    test_size=TEST_SET_SIZE, train_size=TRAINING_SET_SIZE,
                                                    random_state=42)

I have the following:

我有以下几点:

>>> model = KNeighborsClassifier(metric='pyfunc',func=machine_learning.custom_distance)
>>> model.fit(train_feature['id'], train_class.as_matrix(['gender']))
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='pyfunc',
       metric_params={'func': <function custom_distance at 0x7fd0236267b8>},
       n_neighbors=5, p=2, weights='uniform')

Where train_features has one column ([24000 rows x 1 columns]), id and train_class (Name: gender, dtype: object) is a series with "gender" which is 'M' or 'F'. The id corresponds to a key in a dict elsewhere.

其中 train_features 有一列([24000 行 x 1 列]),id 和 train_class(名称:性别,dtype:对象)是一个带有“性别”的系列,即“M”或“F”。id 对应于其他地方的字典中的一个键。

The custom distance function is:

自定义距离函数为:

def custom_distance(x,y):
i, j = int(x[0]), int(y[0])
return damerau_levenshtein_distance(lookup_dict[i],lookup_dict[j])

When I try to get the accuracy of the model:

当我尝试获得模型的准确性时:

 accuracy = model.score(test_feature, test_class)

I receive this error:

我收到此错误:

 ValueError: Expected n_neighbors <= 1. Got 5

I'm honestly really confused. I've checked the length of each of my datasets and they are fine. Why would it be telling me I only have one data point to plot from? Any help would be greatly appreciated.

老实说,我真的很困惑。我检查了每个数据集的长度,它们都很好。为什么它会告诉我我只有一个数据点可以绘制?任何帮助将不胜感激。

回答by cfh

The classifier thinks that your dataset has only a single entry. Probably it interprets the vector of id's as a row vector instead of a column vector.

分类器认为您的数据集只有一个条目。可能它将id's的向量解释为行向量而不是列向量。

Try

尝试

model.fit(train_feature.as_matrix(['id']), train_class.as_matrix(['gender']))

and see if it helps.

看看它是否有帮助。

回答by viajero cósmico

I faced the same error. I have a huge db where I get the train and test data, but for code testing purposes I use a quite smaller one (~0.5% of the original). In the training procedure, I test a number of different neighbors, f.e

我遇到了同样的错误。我有一个巨大的数据库,可以在其中获取训练和测试数据,但出于代码测试目的,我使用了一个非常小的数据库(原始数据的 0.5%)。在训练过程中,我测试了许多不同的邻居,fe

for neighbor in range(5,19): ...

The ValueErrorexception was raised for n_neigbors=19. This error was thrown only when I used the small db. The reason is that it didn't have the actual data input to create 19 differentmeasurements. When I tested with the full db, no such exception was raised.

ValueError异常异常被上调n_neigbors=19。只有当我使用小数据库时才会抛出这个错误。原因是它没有创建19 种不同测量的实际数据输入。当我用完整的数据库进行测试时,没有出现这样的异常。

Setting algorithm='brute'will not solve the problem although it might work. The thing you should do is check the length of your observations , both training and testing, and put an upper limit to the value of n_neighborsaccordingly.

设置 algorithm='brute'不会解决问题,尽管它可能会起作用。您应该做的是检查观察的长度,包括训练和测试,并相应地为 的值设置上限n_neighbors

回答by abhimanyu

Just set the n_neighbors values

只需设置 n_neighbors 值

knn = KNeighborsClassifier(n_neighbors=1)

回答by user2757902

I figured it out. I needed to set the model to brute force and metric to the distance:

我想到了。我需要将模型设置为蛮力和公制距离:

model = KNeighborsClassifier(metric=machine_learning.custom_distance,algorithm='brute',n_neighbors=50)