pandas 将列名映射到随机森林特征重要性

Question

提问by yogz123

I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I've managed to create a plot that shows the importances and uses the original variable names as labels but right now it's ordering the variable names in the order they were in the dataset (and not by order of importance). How do I order them in order of feature importance? Thanks!

我正在尝试绘制随机森林模型的特征重要性并将每个特征重要性映射回原始系数。我设法创建了一个显示重要性的图，并使用原始变量名称作为标签，但现在它按照变量名称在数据集中的顺序（而不是按重要性顺序）对变量名称进行排序。我如何按照特征重要性对它们进行排序？谢谢！

My code is:

我的代码是：

importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
         axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(x_dummies.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
   color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()

Answer 1

回答by Sam

A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:

一种通用的解决方案是将特征/重要性放入数据框中并在绘图之前对它们进行排序：

import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object

feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
    feats[feature] = importance #add the name/value pair 

importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)

Answer 2

回答by D. Ross

I use a similar solution to Sam:

我使用与 Sam 类似的解决方案：

import pandas as pd
important_features = pd.Series(data=brf.feature_importances_,index=x_dummies.columns)
important_features.sort_values(ascending=False,inplace=True)

I always just print the list using print important_featuresbut to plot you could always use Series.plot

我总是只使用打印列表，print important_features但要绘制您可以随时使用Series.plot

Answer 3

回答by Igor Manzhos

Another simple way to get a sorted list

获取排序列表的另一种简单方法

importances = list(zip(xgb_classifier.feature_importances_, df.columns))
importances.sort(reverse=True)

Next code adds a visualization if it's necessary

如果需要，下一个代码会添加一个可视化

pd.DataFrame(importances, index=[x for (_,x) in importances]).plot(kind = 'bar')

Answer 4

回答by Joshy Joy

It is simple, I plotted it like this.

很简单，我是这样画的。

feat_importances = pd.Series(extraTree.feature_importances_, index=X.columns)
feat_importances.nlargest(15).plot(kind='barh')
plt.title("Top 15 important features")
plt.show()

pandas 将列名映射到随机森林特征重要性

提问by yogz123

回答by Sam

回答by D. Ross

回答by Igor Manzhos

回答by Joshy Joy

相关推荐

最近更新

标签

pandas 将列名映射到随机森林特征重要性

提问by yogz123

回答by Sam

回答by D. Ross

回答by Igor Manzhos

回答by Joshy Joy

相关推荐

如何将 Pandas DataFrame 插入现有的 PostgreSQL 表？

pandas 如何在数据框中绘制行

Python Pandas - 读取 CSV 或 Excel

用 Pandas 数据框中的列分位数替换异常值

相关推荐

最近更新

标签