pandas feature_names 必须是唯一的 - Xgboost

Question

提问by user2728024

I am running the xgboost model for a very sparse matrix.

我正在为一个非常稀疏的矩阵运行 xgboost 模型。

I am getting this error. ValueError: feature_names must be unique

我收到此错误。ValueError：feature_names 必须是唯一的

How can I deal with this?

我该如何处理？

This is my code.

这是我的代码。

  yprob = bst.predict(xgb.DMatrix(test_df))[:,1]

Answer 1

回答by andrew_reece

According the the xgboostsource code documentation, this error only occurs in one place- in a DMatrixinternal function. Here's the source code excerpt:

根据xgboost源代码文档，此错误仅发生在一个地方- 在DMatrix内部函数中。这是源代码摘录：

if len(feature_names) != len(set(feature_names)):
    raise ValueError('feature_names must be unique')

So, the error text is pretty literal here; your test_dfhas at least one duplicate feature/column name.

因此，这里的错误文本非常简单；您test_df至少有一个重复的功能/列名称。

You've tagged pandason this post; that suggests test_dfis a Pandas DataFrame. In this case, DMatrixliterally runs df.columnsto extract feature_names. Check your test_dffor repeat column names, remove or rename them, and then try DMatrix()again.

您已pandas在此帖子上加了标签；这表明test_df是 Pandas DataFrame。在这种情况下，DMatrix字面上运行df.columns提取feature_names。检查您test_df的重复列名称，删除或重命名它们，然后重试DMatrix()。

Answer 2

回答by Arjan Groen

Assuming the problem is indeed that columns are duplicated, the following line should solve your problem:

假设问题确实是列重复，以下行应该可以解决您的问题：

test_df = test_df.loc[:,~test_df.columns.duplicated()]

Source: python pandas remove duplicate columns

来源： python pandas 删除重复列

This line should identify which columns are duplicated:

此行应标识哪些列是重复的：

duplicate_columns = test_df.columns[test_df.columns.duplicated()]

Answer 3

回答by Akshay

One way around this can be to use column names that are unique while preparing the data and then it should work out.

解决此问题的一种方法是在准备数据时使用唯一的列名，然后它应该可以解决。

pandas feature_names 必须是唯一的 - Xgboost

提问by user2728024

回答by andrew_reece

回答by Arjan Groen

回答by Akshay

相关推荐

最近更新

标签

pandas feature_names 必须是唯一的 - Xgboost

提问by user2728024

回答by andrew_reece

回答by Arjan Groen

回答by Akshay

相关推荐

pandas 色相条形图的 Seaborn 解决方法

pandas 阅读大量文档时出现“OSError：从文件初始化失败”

pandas numpy中的groupby，计数和平均值，python中的pandas

pandas 绘制时间序列散点图

相关推荐

最近更新

标签