Python 如何使用scikit线性回归找到系数的特征名称?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34649969/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find the features names of the coefficients using scikit linear regression?
提问by amehta
#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']
model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])
model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])
# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_
If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff
如果我改变特征的顺序,coef 仍然以相同的顺序打印,因此我想知道特征与 coeff 的映射
回答by Robin Spiess
The trick is that right after you have trained your model, you know the order of the coefficients:
诀窍是,在您训练模型之后,您就知道系数的顺序:
model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))
This will print the coefficients and the correct feature. (Tested with pandas DataFrame)
这将打印系数和正确的特征。(用熊猫数据帧测试)
If you want to reuse the coefficients later you can also put them in a dictionary:
如果以后要重用这些系数,也可以将它们放入字典中:
coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
coef_dict[feat] = coef
(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)
(您可以通过训练两个具有相同特征的模型来为自己测试,但是,正如您所说,对特征的顺序进行了混洗。)
回答by user1761806
Here is what I use for pretty printing of coefficients in Jupyter. I'm not sure I follow why order is an issue - as far as I know the order of the coefficients should match the order of the input data that you gave it.
这是我用于在 Jupyter 中漂亮打印系数的方法。我不确定我是否遵循为什么顺序是一个问题 - 据我所知,系数的顺序应该与您提供的输入数据的顺序相匹配。
Note that the first line assumes you have a Pandas data frame called df in which you originally stored the data prior to turning it into a numpy array for regression:
请注意,第一行假设您有一个名为 df 的 Pandas 数据框,您最初将数据存储在其中,然后将其转换为用于回归的 numpy 数组:
fieldList = np.array(list(df)).reshape(-1,1)
coeffs = np.reshape(np.round(clf.coef_,5),(-1,1))
coeffs=np.concatenate((fieldList,coeffs),axis=1)
print(pd.DataFrame(coeffs,columns=['Field','Coeff']))
回答by rocksteady
@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:
@Robin 发布了一个很好的答案,但对我来说,我必须对其进行调整才能按照我想要的方式工作,它是指我想要的“coef_”np.array 的维度,即修改为: model_1.coef_[0,:],如下:
coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
coef_dict[feat] = coef
Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.
然后按照我的想象创建了 dict,其中包含 {'feature_name' :coefficient_value} 对。
回答by ZaxR
Borrowing from Robin, but simplifying the syntax:
借用 Robin,但简化了语法:
coef_dict = dict(zip(model_1_features, model_1.coef_))
Important note about zip: zip assumes its inputs are of equal length, making it especially important to confirm that the lengths of the features and coefficients match (which in more complicated models might not be the case). If one input is longer than the other, the longer input will have the values in its extra index positions cut off. Notice the missing 7 in the following example:
关于 zip 的重要说明:zip 假定其输入长度相等,因此确认特征和系数的长度匹配尤为重要(在更复杂的模型中可能并非如此)。如果一个输入比另一个长,则较长的输入将截断其额外索引位置中的值。请注意以下示例中缺少的 7:
In [1]: [i for i in zip([1, 2, 3], [4, 5, 6, 7])]
Out[1]: [(1, 4), (2, 5), (3, 6)]