pandas ValueError: 预期 n_neighbors <= 1. Got 5 -Scikit K 最近分类器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29999297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: Expected n_neighbors <= 1. Got 5 -Scikit K Nearest Classifier
提问by user2757902
I'm using SCIkit KNN and levenstein distance to some work on strings, much like this example at the bottom of this page: http://scikit-learn.org/stable/faq.html. The difference being my data is split into training sets and is in a dataframe.
我正在使用 SCIkit KNN 和 levenstein 距离来处理字符串,就像本页底部的这个例子:http://scikit-learn.org/stable/faq.html 。不同之处在于我的数据被分成训练集并在一个数据框中。
The split is listed here:
此处列出了拆分:
train_feature, test_feature, train_class, test_class = train_test_split(features, classes,
test_size=TEST_SET_SIZE, train_size=TRAINING_SET_SIZE,
random_state=42)
I have the following:
我有以下几点:
>>> model = KNeighborsClassifier(metric='pyfunc',func=machine_learning.custom_distance)
>>> model.fit(train_feature['id'], train_class.as_matrix(['gender']))
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='pyfunc',
metric_params={'func': <function custom_distance at 0x7fd0236267b8>},
n_neighbors=5, p=2, weights='uniform')
Where train_features has one column ([24000 rows x 1 columns]), id and train_class (Name: gender, dtype: object) is a series with "gender" which is 'M' or 'F'. The id corresponds to a key in a dict elsewhere.
其中 train_features 有一列([24000 行 x 1 列]),id 和 train_class(名称:性别,dtype:对象)是一个带有“性别”的系列,即“M”或“F”。id 对应于其他地方的字典中的一个键。
The custom distance function is:
自定义距离函数为:
def custom_distance(x,y):
i, j = int(x[0]), int(y[0])
return damerau_levenshtein_distance(lookup_dict[i],lookup_dict[j])
When I try to get the accuracy of the model:
当我尝试获得模型的准确性时:
accuracy = model.score(test_feature, test_class)
I receive this error:
我收到此错误:
ValueError: Expected n_neighbors <= 1. Got 5
I'm honestly really confused. I've checked the length of each of my datasets and they are fine. Why would it be telling me I only have one data point to plot from? Any help would be greatly appreciated.
老实说,我真的很困惑。我检查了每个数据集的长度,它们都很好。为什么它会告诉我我只有一个数据点可以绘制?任何帮助将不胜感激。
回答by cfh
The classifier thinks that your dataset has only a single entry. Probably it interprets the vector of id's as a row vector instead of a column vector.
分类器认为您的数据集只有一个条目。可能它将id's的向量解释为行向量而不是列向量。
Try
尝试
model.fit(train_feature.as_matrix(['id']), train_class.as_matrix(['gender']))
and see if it helps.
看看它是否有帮助。
回答by viajero cósmico
I faced the same error. I have a huge db where I get the train and test data, but for code testing purposes I use a quite smaller one (~0.5% of the original). In the training procedure, I test a number of different neighbors, f.e
我遇到了同样的错误。我有一个巨大的数据库,可以在其中获取训练和测试数据,但出于代码测试目的,我使用了一个非常小的数据库(原始数据的 0.5%)。在训练过程中,我测试了许多不同的邻居,fe
for neighbor in range(5,19): ...
The ValueErrorexception was raised for n_neigbors=19. This error was thrown only when I used the small db. The reason is that it didn't have the actual data input to create 19 differentmeasurements. When I tested with the full db, no such exception was raised.
该 ValueError异常异常被上调n_neigbors=19。只有当我使用小数据库时才会抛出这个错误。原因是它没有创建19 种不同测量的实际数据输入。当我用完整的数据库进行测试时,没有出现这样的异常。
Setting algorithm='brute'will not solve the problem although it might work. The thing you should do is check the length of your observations , both training and testing, and put an upper limit to the value of n_neighborsaccordingly.
设置 algorithm='brute'不会解决问题,尽管它可能会起作用。您应该做的是检查观察的长度,包括训练和测试,并相应地为 的值设置上限n_neighbors。
回答by abhimanyu
Just set the n_neighbors values
只需设置 n_neighbors 值
knn = KNeighborsClassifier(n_neighbors=1)
回答by user2757902
I figured it out. I needed to set the model to brute force and metric to the distance:
我想到了。我需要将模型设置为蛮力和公制距离:
model = KNeighborsClassifier(metric=machine_learning.custom_distance,algorithm='brute',n_neighbors=50)

