Python XGBClassifier 的特征重要性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38212649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:31:21  来源:igfitidea点击:

Feature Importance with XGBClassifier

pythonscikit-learnxgboost

提问by Minh Mai

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_much like sklearn's random forest.

希望我读错了,但在 XGBoost 库文档中,有说明使用feature_importances_类似于 sklearn 的随机森林来提取特征重要性属性。

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

但是,由于某种原因,我不断收到此错误: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

我的代码片段如下:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_

It seems that you can compute feature importance using the Boosterobject by calling the get_fscoreattribute. The only reason I'm using XGBClassifierover Boosteris because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

似乎您可以Booster通过调用get_fscore属性来使用对象计算特征重要性。我使用XGBClassifierover的唯一原因Booster是因为它能够包装在 sklearn 管道中。关于特征提取的任何想法?有没有其他人遇到过这种情况?

回答by David

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

正如评论所示,我怀疑您的问题是版本问题。但是,如果您不想/无法更新,那么以下功能应该适合您。

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}


>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> 
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> 
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

回答by Minh Mai

I found out the answer. It appears that version 0.4a30does not have feature_importance_attribute. Therefore if you install the xgboost package using pip install xgboostyou will be unable to conduct feature extraction from the XGBClassifierobject, you can refer to @David's answer if you want a workaround.

我找到了答案。看来版本0.4a30没有feature_importance_属性。因此,如果您使用安装 xgboost 包,pip install xgboost您将无法从XGBClassifier对象中提取特征,如果您需要解决方法,可以参考@David的回答。

However, what I did is build it from the source by cloning the repo and running . ./build.shwhich will install version 0.4where the feature_importance_attribute works.

但是,我所做的是通过克隆 repo 并运行. ./build.sh这将安装0.4feature_importance_属性工作的版本来从源代码构建它。

Hope this helps others!

希望这对其他人有帮助!

回答by rosefun

For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

对于xgboost,如果使用xgb.fit(),则可以使用以下方法获取特征重要性。

import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')

from xgboost import plot_importance
plot_importance(xgb_model, )

回答by Ioannis Nasios

Get Feature Importance as a sorted data frame

获取特征重要性作为排序数据框

import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.booster().get_fscore()
    feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
    feats_imp.iloc[:,0]= feats_imp.index    
    feats_imp.columns=['feature','importance']
    feats_imp.sort_values('importance',inplace=True,ascending=False)
    feats_imp.reset_index(drop=True,inplace=True)
    return feats_imp

feature_importance_df = get_xgb_imp(xgb, feat_names)

回答by connor.p

For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.

对于那些与Luís Bianchin有相同问题的人,“TypeError: 'str' object is not callable”,我在这里找到了一个解决方案(至少对我有用)。

In short, I found modifying David's code from

简而言之,我发现修改David的代码来自

imp_vals = xgb.booster().get_fscore()

to

imp_vals = xgb.get_fscore()

worked for me.

为我工作。

For more detail I would recommend visiting the link above.

有关更多详细信息,我建议访问上面的链接。

Big thanks to Davidand ianozsvald

非常感谢Davidianozsvald

回答by Jeroen Boeye

An update of the accepted answer since it no longer works:

已接受答案的更新,因为它不再有效:

def get_xgb_imp(xgb_model, feat_names):
    imp_vals = xgb_model.get_fscore()
    imp_dict = {feat: float(imp_vals.get(feat, 0.)) for feat in feat_names}
    total = sum(list(imp_dict.values()))
    return {k: round(v/total, 5) for k,v in imp_dict.items()}

回答by Aditya Mishra

It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore()to imp_vals = xgb.get_booster().get_fscore()in @David's answer does the trick.The updated code is -

似乎 api 一直在变化。对于 xgboost 版本1.0.2,只需在@David的答案中从imp_vals = xgb.booster().get_fscore()to更改即可。imp_vals = xgb.get_booster().get_fscore()更新后的代码是——

from numpy import array

def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.get_booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}