Python 确定 sklearn 中 SVM 分类器的最大贡献特征

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41592661/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:17:36  来源:igfitidea点击:

Determining the most contributing features for SVM classifier in sklearn

pythonmachine-learningscikit-learnsvm

提问by Jibin Mathew

I have a dataset and I want to train my model on that data. After training, I need to know the features that are major contributors in the classification for a SVM classifier.

我有一个数据集,我想在该数据上训练我的模型。训练后,我需要知道在 SVM 分类器的分类中起主要作用的特征。

There is something called feature importance for forest algorithms, is there anything similar?

森林算法有一种叫做特征重要性的东西,有没有类似的东西?

回答by Jakub Macina

Yes, there is attribute coef_for SVM classifier but it only works for SVM with linear kernel. For other kernels it is not possible because data are transformed by kernel method to another space, which is not related to input space, check the explanation.

是的,coef_SVM 分类器有属性,但它仅适用于具有线性内核的SVM 。对于其他内核,这是不可能的,因为数据通过内核方法转换到另一个与输入空间无关的空间,请查看说明

from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['input1', 'input2']
svm = svm.SVC(kernel='linear')
svm.fit(X, Y)
f_importances(svm.coef_, features_names)

And the output of the function looks like this: Feature importances

该函数的输出如下所示: 特征重要性

回答by Christoph Schmidl

I created a solution which also works for Python 3 and is based on Jakub Macina's code snippet.

我创建了一个也适用于 Python 3 的解决方案,它基于 Jakub Macina 的代码片段。

from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(coef, names, top=-1):
    imp = coef
    imp, names = zip(*sorted(list(zip(imp, names))))

    # Show all features
    if top == -1:
        top = len(names)

    plt.barh(range(top), imp[::-1][0:top], align='center')
    plt.yticks(range(top), names[::-1][0:top])
    plt.show()

# whatever your features are called
features_names = ['input1', 'input2', ...] 
svm = svm.SVC(kernel='linear')
svm.fit(X_train, y_train)

# Specify your top n features you want to visualize.
# You can also discard the abs() function 
# if you are interested in negative contribution of features
f_importances(abs(clf.coef_[0]), feature_names, top=10)

Feature importance

特征重要性

回答by Dor

In only one line of code:

仅在一行代码中:

fit an SVM model:

拟合 SVM 模型:

from sklearn import svm
svm = svm.SVC(gamma=0.001, C=100., kernel = 'linear')

and implement the plot as follows:

并按如下方式实施情节:

pd.Series(abs(svm.coef_[0]), index=features.columns).nlargest(10).plot(kind='barh')

The resuit will be:

结果将是:

the most contributing features of the SVM model in absolute values

SVM 模型在绝对值中最有贡献的特征