pandas XGBoost plot_importance 不显示特征名称

Question

提问by stackoverflowuser2010

I'm using XGBoost with Python and have successfully trained a model using the XGBoost train()function called on DMatrixdata. The matrix was created from a Pandas dataframe, which has feature names for the columns.

我正在将 XGBoost 与 Python 结合使用，并且已经使用train()对DMatrix数据调用的 XGBoost函数成功地训练了一个模型。该矩阵是从 Pandas 数据框创建的，该数据框具有列的特征名称。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)

model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
                  early_stopping_rounds=50, maximize=False, verbose_eval=10)

fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)

I want to now see the feature importance using the xgboost.plot_importance()function, but the resulting plot doesn't show the feature names. Instead, the features are listed as f1, f2, f3, etc. as shown below.

我现在想使用该xgboost.plot_importance()函数查看特征重要性，但生成的图未显示特征名称。但是，这些功能被列为f1，f2，f3等如下所示。

I think the problem is that I converted my original Pandas data frame into a DMatrix. How can I associate feature names properly so that the feature importance plot shows them?

我认为问题在于我将原始 Pandas 数据框转换为 DMatrix。如何正确关联特征名称以便特征重要性图显示它们？

Answer 1

采纳答案by piRSquared

You want to use the feature_namesparameter when creating your xgb.DMatrix

您想feature_names在创建时使用该参数xgb.DMatrix

dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)

Answer 2

回答by Darrrrrren

If you're using the scikit-learn wrapper you'll need to access the underlying XGBoost Booster and set the feature names on it, instead of the scikit model, like so:

如果您使用 scikit-learn 包装器，则需要访问底层 XGBoost Booster 并在其上设置功能名称，而不是 scikit 模型，如下所示：

model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())

Answer 3

回答by Vivek Kumar

train_test_splitwill convert the dataframe to numpy array which dont have columns information anymore.

train_test_split将数据帧转换为不再有列信息的 numpy 数组。

Either you can do what @piRSquared suggested and pass the features as a parameter to DMatrix constructor. Or else, you can convert the numpy array returned from the train_test_splitto a Dataframe and then use your code.

您可以按照@piRSquared 的建议进行操作，并将这些功能作为参数传递给 DMatrix 构造函数。或者，您可以将从返回的 numpy 数组转换train_test_split为 Dataframe，然后使用您的代码。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)

# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)

dtrain = xgb.DMatrix(Xtrain, label=ytrain)

Answer 4

回答by Vincent M.K

With Scikit-Learn Wrapper interface "XGBClassifier",plot_importance reuturns class "matplotlib Axes". So we can employ axes.set_yticklabels.

使用 Scikit-Learn Wrapper 接口“XGBClassifier”，plot_importance 返回类“matplotlib Axes”。所以我们可以使用axes.set_yticklabels。

plot_importance(model).set_yticklabels(['feature1','feature2'])

Answer 5

回答by Peter VanderMeer

An alternate way I found whiles playing around with feature_names. While playing around with it, I wrote this which works on XGBoost v0.80 which I'm currently running.

我在玩feature_names. 在玩弄它时，我写了这个，它适用于我目前正在运行的 XGBoost v0.80。

## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
    f.write('\n'.join(model.feature_names))

## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")

with open("foo_fnames.txt", "r") as f:
    feature_names2 = f.read().split("\n")

model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)

So this is saving feature_namesseparately and adding it back in later. For some reason feature_typesalso needs to be initialized, even if the value is None.

所以这是feature_names单独保存并稍后添加回来。出于某种原因feature_types也需要初始化，即使值是None.

Answer 6

回答by Badger Titan

If trained with

如果训练有

model = XGBClassifier(
    max_depth = 8, 
    learning_rate = 0.25, 
    n_estimators = 50, 
    objective = "binary:logistic",
    n_jobs = 4
)

# x, y are pandas DataFrame
model.fit(train_data_x, train_data_y)

you can do model.get_booster().get_fscore()to get feature names and feature importance as a python dict

您可以model.get_booster().get_fscore()将特征名称和特征重要性作为python dict

Answer 7

回答by Gianmario Spacagna

You should specify the feature_names when instantiating the XGBoost Classifier:

您应该在实例化 XGBoost 分类器时指定 feature_names：

xgb = xgb.XGBClassifier(feature_names=feature_names)

Be careful that if you wrap the xgb classifier in a sklearn pipeline that performs any selection on the columns (e.g. VarianceThreshold) the xgb classifier will fail when trying to fit or transform.

请注意，如果您将 xgb 分类器包装在对列执行任何选择（例如 VarianceThreshold）的 sklearn 管道中，则 xgb 分类器在尝试拟合或转换时将失败。

pandas XGBoost plot_importance 不显示特征名称

提问by stackoverflowuser2010

采纳答案by piRSquared

回答by Darrrrrren

回答by Vivek Kumar

回答by Vincent M.K

回答by Peter VanderMeer

回答by Badger Titan

回答by Gianmario Spacagna

相关推荐

最近更新

标签

pandas XGBoost plot_importance 不显示特征名称

提问by stackoverflowuser2010

采纳答案by piRSquared

回答by Darrrrrren

回答by Vivek Kumar

回答by Vincent M.K

回答by Peter VanderMeer

回答by Badger Titan

回答by Gianmario Spacagna

相关推荐

pandas Python - 从字符串中删除小数和零

pandas 使用来自其他数据帧的匹配值在数据帧中创建新列

如何在 Pandas 中读取大型 json？

pandas 使用 seaborn 绘制系列

相关推荐

最近更新

标签