Python XGBClassifier 的特征重要性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38212649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Feature Importance with XGBClassifier
提问by Minh Mai
Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_
much like sklearn's random forest.
希望我读错了,但在 XGBoost 库文档中,有说明使用feature_importances_
类似于 sklearn 的随机森林来提取特征重要性属性。
However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'
但是,由于某种原因,我不断收到此错误: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'
My code snippet is below:
我的代码片段如下:
from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_
It seems that you can compute feature importance using the Booster
object by calling the get_fscore
attribute. The only reason I'm using XGBClassifier
over Booster
is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?
似乎您可以Booster
通过调用get_fscore
属性来使用对象计算特征重要性。我使用XGBClassifier
over的唯一原因Booster
是因为它能够包装在 sklearn 管道中。关于特征提取的任何想法?有没有其他人遇到过这种情况?
回答by David
As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.
正如评论所示,我怀疑您的问题是版本问题。但是,如果您不想/无法更新,那么以下功能应该适合您。
def get_xgb_imp(xgb, feat_names):
from numpy import array
imp_vals = xgb.booster().get_fscore()
imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
total = array(imp_dict.values()).sum()
return {k:v/total for k,v in imp_dict.items()}
>>> import numpy as np
>>> from xgboost import XGBClassifier
>>>
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>>
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}
回答by Minh Mai
I found out the answer. It appears that version 0.4a30
does not have feature_importance_
attribute. Therefore if you install the xgboost package using pip install xgboost
you will be unable to conduct feature extraction from the XGBClassifier
object, you can refer to @David's answer if you want a workaround.
我找到了答案。看来版本0.4a30
没有feature_importance_
属性。因此,如果您使用安装 xgboost 包,pip install xgboost
您将无法从XGBClassifier
对象中提取特征,如果您需要解决方法,可以参考@David的回答。
However, what I did is build it from the source by cloning the repo and running . ./build.sh
which will install version 0.4
where the feature_importance_
attribute works.
但是,我所做的是通过克隆 repo 并运行. ./build.sh
这将安装0.4
该feature_importance_
属性工作的版本来从源代码构建它。
Hope this helps others!
希望这对其他人有帮助!
回答by rosefun
For xgboost
, if you use xgb.fit()
,then you can use the following method to get feature importance.
对于xgboost
,如果使用xgb.fit()
,则可以使用以下方法获取特征重要性。
import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')
from xgboost import plot_importance
plot_importance(xgb_model, )
回答by Ioannis Nasios
Get Feature Importance as a sorted data frame
获取特征重要性作为排序数据框
import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
imp_vals = xgb.booster().get_fscore()
feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
feats_imp.iloc[:,0]= feats_imp.index
feats_imp.columns=['feature','importance']
feats_imp.sort_values('importance',inplace=True,ascending=False)
feats_imp.reset_index(drop=True,inplace=True)
return feats_imp
feature_importance_df = get_xgb_imp(xgb, feat_names)
回答by connor.p
For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.
对于那些与Luís Bianchin有相同问题的人,“TypeError: 'str' object is not callable”,我在这里找到了一个解决方案(至少对我有用)。
In short, I found modifying David's code from
简而言之,我发现修改David的代码来自
imp_vals = xgb.booster().get_fscore()
to
到
imp_vals = xgb.get_fscore()
worked for me.
为我工作。
For more detail I would recommend visiting the link above.
有关更多详细信息,我建议访问上面的链接。
Big thanks to Davidand ianozsvald
非常感谢David和ianozsvald
回答by Jeroen Boeye
An update of the accepted answer since it no longer works:
已接受答案的更新,因为它不再有效:
def get_xgb_imp(xgb_model, feat_names):
imp_vals = xgb_model.get_fscore()
imp_dict = {feat: float(imp_vals.get(feat, 0.)) for feat in feat_names}
total = sum(list(imp_dict.values()))
return {k: round(v/total, 5) for k,v in imp_dict.items()}
回答by Aditya Mishra
It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore()
to imp_vals = xgb.get_booster().get_fscore()
in @David's answer does the trick.The updated code is -
似乎 api 一直在变化。对于 xgboost 版本1.0.2,只需在@David的答案中从imp_vals = xgb.booster().get_fscore()
to更改即可。imp_vals = xgb.get_booster().get_fscore()
更新后的代码是——
from numpy import array
def get_xgb_imp(xgb, feat_names):
imp_vals = xgb.get_booster().get_fscore()
imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
total = array(imp_dict.values()).sum()
return {k:v/total for k,v in imp_dict.items()}