Python 如何使用scikit线性回归找到系数的特征名称？

Question

提问by amehta

#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])

model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])

model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])

# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_

If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff

如果我改变特征的顺序，coef 仍然以相同的顺序打印，因此我想知道特征与 coeff 的映射

Answer 1

回答by Robin Spiess

The trick is that right after you have trained your model, you know the order of the coefficients:

诀窍是，在您训练模型之后，您就知道系数的顺序：

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

这将打印系数和正确的特征。（用熊猫数据帧测试）

If you want to reuse the coefficients later you can also put them in a dictionary:

如果以后要重用这些系数，也可以将它们放入字典中：

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)

（您可以通过训练两个具有相同特征的模型来为自己测试，但是，正如您所说，对特征的顺序进行了混洗。）

Answer 2

回答by user1761806

Here is what I use for pretty printing of coefficients in Jupyter. I'm not sure I follow why order is an issue - as far as I know the order of the coefficients should match the order of the input data that you gave it.

这是我用于在 Jupyter 中漂亮打印系数的方法。我不确定我是否遵循为什么顺序是一个问题 - 据我所知，系数的顺序应该与您提供的输入数据的顺序相匹配。

Note that the first line assumes you have a Pandas data frame called df in which you originally stored the data prior to turning it into a numpy array for regression:

请注意，第一行假设您有一个名为 df 的 Pandas 数据框，您最初将数据存储在其中，然后将其转换为用于回归的 numpy 数组：

fieldList = np.array(list(df)).reshape(-1,1)

coeffs = np.reshape(np.round(clf.coef_,5),(-1,1))
coeffs=np.concatenate((fieldList,coeffs),axis=1)
print(pd.DataFrame(coeffs,columns=['Field','Coeff']))

Answer 3

回答by rocksteady

@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:

@Robin 发布了一个很好的答案，但对我来说，我必须对其进行调整才能按照我想要的方式工作，它是指我想要的“coef_”np.array 的维度，即修改为： model_1.coef_[0,:]，如下：

coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
    coef_dict[feat] = coef

Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.

然后按照我的想象创建了 dict，其中包含 {'feature_name' :coefficient_value} 对。

Answer 4

回答by ZaxR

Borrowing from Robin, but simplifying the syntax:

借用 Robin，但简化了语法：

coef_dict = dict(zip(model_1_features, model_1.coef_))

Important note about zip: zip assumes its inputs are of equal length, making it especially important to confirm that the lengths of the features and coefficients match (which in more complicated models might not be the case). If one input is longer than the other, the longer input will have the values in its extra index positions cut off. Notice the missing 7 in the following example:

关于 zip 的重要说明：zip 假定其输入长度相等，因此确认特征和系数的长度匹配尤为重要（在更复杂的模型中可能并非如此）。如果一个输入比另一个长，则较长的输入将截断其额外索引位置中的值。请注意以下示例中缺少的 7：

In [1]: [i for i in zip([1, 2, 3], [4, 5, 6, 7])]
Out[1]: [(1, 4), (2, 5), (3, 6)]

Python 如何使用scikit线性回归找到系数的特征名称？

提问by amehta

回答by Robin Spiess

回答by user1761806

回答by rocksteady

回答by ZaxR

相关推荐

最近更新

标签

Python 如何使用scikit线性回归找到系数的特征名称？

提问by amehta

回答by Robin Spiess

回答by user1761806

回答by rocksteady

回答by ZaxR

相关推荐

Python 读取主机名列表并解析为 IP 地址

从文本 Python 中识别和提取日期的最佳方法？

将代理设置为 urllib.request (Python3)

Python NameError: 全局名称 'myExample2' 未定义 # modules

相关推荐

最近更新

标签