pandas ValueError: 预期 n_neighbors <= 1. Got 5 -Scikit K 最近分类器

Question

提问by user2757902

I'm using SCIkit KNN and levenstein distance to some work on strings, much like this example at the bottom of this page: http://scikit-learn.org/stable/faq.html. The difference being my data is split into training sets and is in a dataframe.

我正在使用 SCIkit KNN 和 levenstein 距离来处理字符串，就像本页底部的这个例子：http://scikit-learn.org/stable/faq.html 。不同之处在于我的数据被分成训练集并在一个数据框中。

The split is listed here:

此处列出了拆分：

train_feature, test_feature, train_class, test_class = train_test_split(features, classes,
                                                    test_size=TEST_SET_SIZE, train_size=TRAINING_SET_SIZE,
                                                    random_state=42)

I have the following:

我有以下几点：

>>> model = KNeighborsClassifier(metric='pyfunc',func=machine_learning.custom_distance)
>>> model.fit(train_feature['id'], train_class.as_matrix(['gender']))
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='pyfunc',
       metric_params={'func': <function custom_distance at 0x7fd0236267b8>},
       n_neighbors=5, p=2, weights='uniform')

Where train_features has one column ([24000 rows x 1 columns]), id and train_class (Name: gender, dtype: object) is a series with "gender" which is 'M' or 'F'. The id corresponds to a key in a dict elsewhere.

其中 train_features 有一列（[24000 行 x 1 列]），id 和 train_class（名称：性别，dtype：对象）是一个带有“性别”的系列，即“M”或“F”。id 对应于其他地方的字典中的一个键。

The custom distance function is:

自定义距离函数为：

def custom_distance(x,y):
i, j = int(x[0]), int(y[0])
return damerau_levenshtein_distance(lookup_dict[i],lookup_dict[j])

When I try to get the accuracy of the model:

当我尝试获得模型的准确性时：

 accuracy = model.score(test_feature, test_class)

I receive this error:

我收到此错误：

 ValueError: Expected n_neighbors <= 1. Got 5

I'm honestly really confused. I've checked the length of each of my datasets and they are fine. Why would it be telling me I only have one data point to plot from? Any help would be greatly appreciated.

老实说，我真的很困惑。我检查了每个数据集的长度，它们都很好。为什么它会告诉我我只有一个数据点可以绘制？任何帮助将不胜感激。

Answer 1

回答by cfh

The classifier thinks that your dataset has only a single entry. Probably it interprets the vector of id's as a row vector instead of a column vector.

分类器认为您的数据集只有一个条目。可能它将id's的向量解释为行向量而不是列向量。

Try

尝试

model.fit(train_feature.as_matrix(['id']), train_class.as_matrix(['gender']))

and see if it helps.

看看它是否有帮助。

Answer 2

回答by viajero cósmico

I faced the same error. I have a huge db where I get the train and test data, but for code testing purposes I use a quite smaller one (~0.5% of the original). In the training procedure, I test a number of different neighbors, f.e

我遇到了同样的错误。我有一个巨大的数据库，可以在其中获取训练和测试数据，但出于代码测试目的，我使用了一个非常小的数据库（原始数据的 0.5%）。在训练过程中，我测试了许多不同的邻居，fe

for neighbor in range(5,19): ...

The ValueErrorexception was raised for n_neigbors=19. This error was thrown only when I used the small db. The reason is that it didn't have the actual data input to create 19 differentmeasurements. When I tested with the full db, no such exception was raised.

该 ValueError异常异常被上调n_neigbors=19。只有当我使用小数据库时才会抛出这个错误。原因是它没有创建19 种不同测量的实际数据输入。当我用完整的数据库进行测试时，没有出现这样的异常。

Setting algorithm='brute'will not solve the problem although it might work. The thing you should do is check the length of your observations , both training and testing, and put an upper limit to the value of n_neighborsaccordingly.

设置 algorithm='brute'不会解决问题，尽管它可能会起作用。您应该做的是检查观察的长度，包括训练和测试，并相应地为的值设置上限n_neighbors。

Answer 3

回答by abhimanyu

Just set the n_neighbors values

只需设置 n_neighbors 值

knn = KNeighborsClassifier(n_neighbors=1)

Answer 4

回答by user2757902

I figured it out. I needed to set the model to brute force and metric to the distance:

我想到了。我需要将模型设置为蛮力和公制距离：

model = KNeighborsClassifier(metric=machine_learning.custom_distance,algorithm='brute',n_neighbors=50)

pandas ValueError: 预期 n_neighbors <= 1. Got 5 -Scikit K 最近分类器

提问by user2757902

回答by cfh

回答by viajero cósmico

回答by abhimanyu

回答by user2757902

相关推荐

最近更新

标签

pandas ValueError: 预期 n_neighbors <= 1. Got 5 -Scikit K 最近分类器

提问by user2757902

回答by cfh

回答by viajero cósmico

回答by abhimanyu

回答by user2757902

相关推荐

为什么 Pandas Concatenation (pandas.concat) 如此内存效率低下？

pandas 为通过 groupby 应用结果设置列名

pandas 获取pandas中某个索引值前后的行数

在 Pandas 数据框列中访问字典键和值

相关推荐

最近更新

标签