Python XGBClassifier 的特征重要性

Question

提问by Minh Mai

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_much like sklearn's random forest.

希望我读错了，但在 XGBoost 库文档中，有说明使用feature_importances_类似于 sklearn 的随机森林来提取特征重要性属性。

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

但是，由于某种原因，我不断收到此错误： AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

我的代码片段如下：

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_

It seems that you can compute feature importance using the Boosterobject by calling the get_fscoreattribute. The only reason I'm using XGBClassifierover Boosteris because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

似乎您可以Booster通过调用get_fscore属性来使用对象计算特征重要性。我使用XGBClassifierover的唯一原因Booster是因为它能够包装在 sklearn 管道中。关于特征提取的任何想法？有没有其他人遇到过这种情况？

Answer 1

回答by David

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

正如评论所示，我怀疑您的问题是版本问题。但是，如果您不想/无法更新，那么以下功能应该适合您。

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}


>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> 
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> 
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

Answer 2

回答by Minh Mai

I found out the answer. It appears that version 0.4a30does not have feature_importance_attribute. Therefore if you install the xgboost package using pip install xgboostyou will be unable to conduct feature extraction from the XGBClassifierobject, you can refer to @David's answer if you want a workaround.

我找到了答案。看来版本0.4a30没有feature_importance_属性。因此，如果您使用安装 xgboost 包，pip install xgboost您将无法从XGBClassifier对象中提取特征，如果您需要解决方法，可以参考@David的回答。

However, what I did is build it from the source by cloning the repo and running . ./build.shwhich will install version 0.4where the feature_importance_attribute works.

但是，我所做的是通过克隆 repo 并运行. ./build.sh这将安装0.4该feature_importance_属性工作的版本来从源代码构建它。

Hope this helps others!

希望这对其他人有帮助！

Answer 3

回答by rosefun

For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

对于xgboost，如果使用xgb.fit()，则可以使用以下方法获取特征重要性。

import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')

from xgboost import plot_importance
plot_importance(xgb_model, )

Answer 4

回答by Ioannis Nasios

Get Feature Importance as a sorted data frame

获取特征重要性作为排序数据框

import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.booster().get_fscore()
    feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
    feats_imp.iloc[:,0]= feats_imp.index    
    feats_imp.columns=['feature','importance']
    feats_imp.sort_values('importance',inplace=True,ascending=False)
    feats_imp.reset_index(drop=True,inplace=True)
    return feats_imp

feature_importance_df = get_xgb_imp(xgb, feat_names)

Answer 5

回答by connor.p

For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.

对于那些与Luís Bianchin有相同问题的人，“TypeError: 'str' object is not callable”，我在这里找到了一个解决方案（至少对我有用）。

In short, I found modifying David's code from

简而言之，我发现修改David的代码来自

imp_vals = xgb.booster().get_fscore()

to

到

imp_vals = xgb.get_fscore()

worked for me.

为我工作。

For more detail I would recommend visiting the link above.

有关更多详细信息，我建议访问上面的链接。

Big thanks to Davidand ianozsvald

非常感谢David和ianozsvald

Answer 6

回答by Jeroen Boeye

An update of the accepted answer since it no longer works:

已接受答案的更新，因为它不再有效：

def get_xgb_imp(xgb_model, feat_names):
    imp_vals = xgb_model.get_fscore()
    imp_dict = {feat: float(imp_vals.get(feat, 0.)) for feat in feat_names}
    total = sum(list(imp_dict.values()))
    return {k: round(v/total, 5) for k,v in imp_dict.items()}

Answer 7

回答by Aditya Mishra

It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore()to imp_vals = xgb.get_booster().get_fscore()in @David's answer does the trick.The updated code is -

似乎 api 一直在变化。对于 xgboost 版本1.0.2，只需在@David的答案中从imp_vals = xgb.booster().get_fscore()to更改即可。imp_vals = xgb.get_booster().get_fscore()更新后的代码是——

from numpy import array

def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.get_booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}

Python XGBClassifier 的特征重要性

提问by Minh Mai

回答by David

回答by Minh Mai

回答by rosefun

回答by Ioannis Nasios

回答by connor.p

回答by Jeroen Boeye

回答by Aditya Mishra

相关推荐

最近更新

标签

Python XGBClassifier 的特征重要性

提问by Minh Mai

回答by David

回答by Minh Mai

回答by rosefun

回答by Ioannis Nasios

回答by connor.p

回答by Jeroen Boeye

回答by Aditya Mishra

相关推荐

Python graph.write_pdf("iris.pdf") AttributeError: 'list' 对象没有属性 'write_pdf'

Python OpenCV 4 TypeError：参数“标签”的预期 cv::UMat

Python TkMessageBox - 无模块

Python 熊猫从日期时间转换为整数时间戳

相关推荐

最近更新

标签