Python 如何找到逻辑回归模型特征的重要性？

Question

提问by mgokhanbakal

I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative class. I know there is coef_parameter comes from the scikit-learn package, but I don't know whether it is enough to for the importance. Another thing is how I can evaluate the coef_values in terms of the importance for negative and positive classes. I also read about standardized regression coefficients and I don't know what it is.

我有一个由逻辑回归算法训练的二元预测模型。我想知道哪些特征（预测器）对于正类或负类的决定更重要。我知道有coef_参数来自 scikit-learn 包，但我不知道它是否足够重要。另一件事是我如何coef_根据负类和正类的重要性来评估这些值。我还阅读了标准化回归系数，但我不知道它是什么。

Lets say there are features like size of tumor, weight of tumor, and etc to make a decision for a test case like malignant or not malignant. I want to know which of the features are more important for malignant and not malignant prediction. Does it make sort of sense?

假设有诸如肿瘤大小、肿瘤重量等特征来决定测试用例是否为恶性。我想知道哪些特征对于恶性预测和非恶性预测更重要。这有点道理吗？

Answer 1

采纳答案by KT.

One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

在线性分类模型（逻辑是其中之一）中感受给定参数的“影响”的最简单选项之一是考虑其系数的大小乘以数据中相应参数的标准偏差.

Consider this example:

考虑这个例子：

import numpy as np    
from sklearn.linear_model import LogisticRegression

x1 = np.random.randn(100)
x2 = 4*np.random.randn(100)
x3 = 0.5*np.random.randn(100)
y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0
X = np.column_stack([x1, x2, x3])

m = LogisticRegression()
m.fit(X, y)

# The estimated coefficients will all be around 1:
print(m.coef_)

# Those values, however, will show that the second parameter
# is more influential
print(np.std(X, 0)*m.coef_)

An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:

获得类似结果的另一种方法是检查模型拟合标准化参数的系数：

m.fit(X / np.std(X, 0), y)
print(m.coef_)

Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

请注意，这是最基本的方法，并且存在许多其他用于查找特征重要性或参数影响的技术（使用 p 值、引导分数、各种“判别指数”等）。

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.

我很确定你会在https://stats.stackexchange.com/得到更多有趣的答案。

Python 如何找到逻辑回归模型特征的重要性？

提问by mgokhanbakal

采纳答案by KT.

相关推荐

最近更新

标签

Python 如何找到逻辑回归模型特征的重要性？

提问by mgokhanbakal

采纳答案by KT.

相关推荐

Python 将熊猫数据帧数据作为 html 电子邮件发送

Python 通过添加其他列的值在 Panda 数据框中创建新列

Python 日期时间列表的平均时间

Python 请求 API 使用代理进行 https 请求获取 407 需要代理身份验证

相关推荐

最近更新

标签