pandas 将列名映射到随机森林特征重要性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41900387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mapping column names to random forest feature importances
提问by yogz123
I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I've managed to create a plot that shows the importances and uses the original variable names as labels but right now it's ordering the variable names in the order they were in the dataset (and not by order of importance). How do I order them in order of feature importance? Thanks!
我正在尝试绘制随机森林模型的特征重要性并将每个特征重要性映射回原始系数。我设法创建了一个显示重要性的图,并使用原始变量名称作为标签,但现在它按照变量名称在数据集中的顺序(而不是按重要性顺序)对变量名称进行排序。我如何按照特征重要性对它们进行排序?谢谢!
My code is:
我的代码是:
importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
axis=0)
indices = np.argsort(importances)[::-1]
# Print the feature ranking
print("Feature ranking:")
for f in range(x_dummies.shape[1]):
print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))
# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()
回答by Sam
A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:
一种通用的解决方案是将特征/重要性放入数据框中并在绘图之前对它们进行排序:
import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object
feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
feats[feature] = importance #add the name/value pair
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)
回答by D. Ross
I use a similar solution to Sam:
我使用与 Sam 类似的解决方案:
import pandas as pd
important_features = pd.Series(data=brf.feature_importances_,index=x_dummies.columns)
important_features.sort_values(ascending=False,inplace=True)
I always just print the list using print important_features
but to plot you could always use Series.plot
我总是只使用打印列表,print important_features
但要绘制您可以随时使用Series.plot
回答by Igor Manzhos
Another simple way to get a sorted list
获取排序列表的另一种简单方法
importances = list(zip(xgb_classifier.feature_importances_, df.columns))
importances.sort(reverse=True)
Next code adds a visualization if it's necessary
如果需要,下一个代码会添加一个可视化
pd.DataFrame(importances, index=[x for (_,x) in importances]).plot(kind = 'bar')
回答by Joshy Joy
It is simple, I plotted it like this.
很简单,我是这样画的。
feat_importances = pd.Series(extraTree.feature_importances_, index=X.columns)
feat_importances.nlargest(15).plot(kind='barh')
plt.title("Top 15 important features")
plt.show()