pandas XGBoost: AttributeError: 'DataFrame' 对象没有属性 'feature_names'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/55579610/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:22:31  来源:igfitidea点击:

XGBoost: AttributeError: 'DataFrame' object has no attribute 'feature_names'

pythonpandasmachine-learningscikit-learnxgboost

提问by Abdullah Al Imran

I've trained an XGBoost Classifier for binary classification. While training the model on train data using CV and predicting on the test data, I face the error AttributeError: 'DataFrame' object has no attribute 'feature_names'.

我已经训练了一个用于二元分类的 XGBoost 分类器。在使用 CV 训练训练数据模型并预测测试数据时,我面临错误AttributeError: 'DataFrame' object has no attribute 'feature_names'

My code is as follows:

我的代码如下:

folds = StratifiedKFold(n_splits=5, shuffle=False, random_state=44000)
oof = np.zeros(len(X_train))
predictions = np.zeros(len(X_test))

for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train, y_train)):
    print("Fold {}".format(fold_+1))
    trn_data = xgb.DMatrix(X_train.iloc[trn_idx], y_train.iloc[trn_idx])
    val_data = xgb.DMatrix(X_train.iloc[val_idx], y_train.iloc[val_idx])

    clf = xgb.train(params = best_params,
                    dtrain = trn_data, 
                    num_boost_round = 2000, 
                    evals = [(trn_data, 'train'), (val_data, 'valid')],
                    maximize = False,
                    early_stopping_rounds = 100, 
                    verbose_eval=100)

    oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)
    predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits

How to deal with it?

如何处理?

Here is the complete error trace:

这是完整的错误跟踪:

Fold 1
[0] train-auc:0.919667  valid-auc:0.822968
Multiple eval metrics have been passed: 'valid-auc' will be used for early stopping.

Will train until valid-auc hasn't improved in 100 rounds.
[100]   train-auc:1 valid-auc:0.974659
[200]   train-auc:1 valid-auc:0.97668
[300]   train-auc:1 valid-auc:0.977696
[400]   train-auc:1 valid-auc:0.977704
Stopping. Best iteration:
[376]   train-auc:1 valid-auc:0.977862

Exception ignored in: <bound method DMatrix.__del__ of <xgboost.core.DMatrix object at 0x7f3d9c285550>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/xgboost/core.py", line 368, in __del__
    if self.handle is not None:
AttributeError: 'DMatrix' object has no attribute 'handle'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-55-d52b20cc0183> in <module>()
     19                     verbose_eval=100)
     20 
---> 21     oof[val_idx] = clf.predict(X_train.iloc[val_idx], ntree_limit=clf.best_ntree_limit)
     22 
     23     predictions += clf.predict(X_test, ntree_limit=clf.best_ntree_limit)/folds.n_splits

/usr/local/lib/python3.6/dist-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs)
   1042             option_mask |= 0x08
   1043 
-> 1044         self._validate_features(data)
   1045 
   1046         length = c_bst_ulong()

/usr/local/lib/python3.6/dist-packages/xgboost/core.py in _validate_features(self, data)
   1271         else:
   1272             # Booster can't accept data with different feature names
-> 1273             if self.feature_names != data.feature_names:
   1274                 dat_missing = set(self.feature_names) - set(data.feature_names)
   1275                 my_missing = set(data.feature_names) - set(self.feature_names)

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   3612             if name in self._info_axis:
   3613                 return self[name]
-> 3614             return object.__getattribute__(self, name)
   3615 
   3616     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'feature_names'

回答by Abdullah Al Imran

The problem has been solved. The problem is, I didn't converted the X_train.iloc[val_idx]to xgb.DMatrix. After converting X_train.iloc[val_idx]and X_testto xgb.DMatrixthe plroblem was gone!

问题已经解决。问题是,我没有X_train.iloc[val_idx]xgb.DMatrix. 转换后X_train.iloc[val_idx],并X_testxgb.DMatrix该plroblem不见了!

Updated the following two lines:

更新了以下两行:

oof[val_idx] = clf.predict(xgb.DMatrix(X_train.iloc[val_idx]), ntree_limit=clf.best_ntree_limit)
predictions += clf.predict(xgb.DMatrix(X_test), ntree_limit=clf.best_ntree_limit)/folds.n_splits