pandas XGBoost plot_importance 不显示特征名称
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46943314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
XGBoost plot_importance doesn't show feature names
提问by stackoverflowuser2010
I'm using XGBoost with Python and have successfully trained a model using the XGBoost train()
function called on DMatrix
data. The matrix was created from a Pandas dataframe, which has feature names for the columns.
我正在将 XGBoost 与 Python 结合使用,并且已经使用train()
对DMatrix
数据调用的 XGBoost函数成功地训练了一个模型。该矩阵是从 Pandas 数据框创建的,该数据框具有列的特征名称。
Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)
model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
early_stopping_rounds=50, maximize=False, verbose_eval=10)
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)
I want to now see the feature importance using the xgboost.plot_importance()
function, but the resulting plot doesn't show the feature names. Instead, the features are listed as f1
, f2
, f3
, etc. as shown below.
我现在想使用该xgboost.plot_importance()
函数查看特征重要性,但生成的图未显示特征名称。但是,这些功能被列为f1
,f2
,f3
等如下所示。
I think the problem is that I converted my original Pandas data frame into a DMatrix. How can I associate feature names properly so that the feature importance plot shows them?
我认为问题在于我将原始 Pandas 数据框转换为 DMatrix。如何正确关联特征名称以便特征重要性图显示它们?
采纳答案by piRSquared
You want to use the feature_names
parameter when creating your xgb.DMatrix
您想feature_names
在创建时使用该参数xgb.DMatrix
dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)
回答by Darrrrrren
If you're using the scikit-learn wrapper you'll need to access the underlying XGBoost Booster and set the feature names on it, instead of the scikit model, like so:
如果您使用 scikit-learn 包装器,则需要访问底层 XGBoost Booster 并在其上设置功能名称,而不是 scikit 模型,如下所示:
model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())
回答by Vivek Kumar
train_test_split
will convert the dataframe to numpy array which dont have columns information anymore.
train_test_split
将数据帧转换为不再有列信息的 numpy 数组。
Either you can do what @piRSquared suggested and pass the features as a parameter to DMatrix constructor. Or else, you can convert the numpy array returned from the train_test_split
to a Dataframe and then use your code.
您可以按照@piRSquared 的建议进行操作,并将这些功能作为参数传递给 DMatrix 构造函数。或者,您可以将从 返回的 numpy 数组转换train_test_split
为 Dataframe,然后使用您的代码。
Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
test_size=0.2, random_state=42)
# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)
回答by Vincent M.K
With Scikit-Learn Wrapper interface "XGBClassifier",plot_importance reuturns class "matplotlib Axes". So we can employ axes.set_yticklabels.
使用 Scikit-Learn Wrapper 接口“XGBClassifier”,plot_importance 返回类“matplotlib Axes”。所以我们可以使用axes.set_yticklabels。
plot_importance(model).set_yticklabels(['feature1','feature2'])
plot_importance(model).set_yticklabels(['feature1','feature2'])
回答by Peter VanderMeer
An alternate way I found whiles playing around with feature_names
. While playing around with it, I wrote this which works on XGBoost v0.80 which I'm currently running.
我在玩feature_names
. 在玩弄它时,我写了这个,它适用于我目前正在运行的 XGBoost v0.80。
## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
f.write('\n'.join(model.feature_names))
## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")
with open("foo_fnames.txt", "r") as f:
feature_names2 = f.read().split("\n")
model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)
So this is saving feature_names
separately and adding it back in later. For some reason feature_types
also needs to be initialized, even if the value is None
.
所以这是feature_names
单独保存并稍后添加回来。出于某种原因feature_types
也需要初始化,即使值是None
.
回答by Badger Titan
If trained with
如果训练有
model = XGBClassifier(
max_depth = 8,
learning_rate = 0.25,
n_estimators = 50,
objective = "binary:logistic",
n_jobs = 4
)
# x, y are pandas DataFrame
model.fit(train_data_x, train_data_y)
you can do model.get_booster().get_fscore()
to get feature names and feature importance as a python dict
您可以model.get_booster().get_fscore()
将特征名称和特征重要性作为python dict
回答by Gianmario Spacagna
You should specify the feature_names when instantiating the XGBoost Classifier:
您应该在实例化 XGBoost 分类器时指定 feature_names:
xgb = xgb.XGBClassifier(feature_names=feature_names)
Be careful that if you wrap the xgb classifier in a sklearn pipeline that performs any selection on the columns (e.g. VarianceThreshold) the xgb classifier will fail when trying to fit or transform.
请注意,如果您将 xgb 分类器包装在对列执行任何选择(例如 VarianceThreshold)的 sklearn 管道中,则 xgb 分类器在尝试拟合或转换时将失败。