Python 从 scikit-learn 管道中获取模型属性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28822756/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting model attributes from scikit-learn pipeline
提问by lmart999
I typically get PCA
loadings like this:
我通常会得到这样的PCA
负载:
pca = PCA(n_components=2)
X_t = pca.fit(X).transform(X)
loadings = pca.components_
If I run PCA
using a scikit-learn
pipline ...
如果我PCA
使用scikit-learn
管道运行...
from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps=[
('scaling',StandardScaler()),
('pca',PCA(n_components=2))
])
X_t=pipeline.fit_transform(X)
... is it possible to get the loadings?
...是否有可能获得负载?
Simply trying loadings = pipeline.components_
fails:
简单尝试loadings = pipeline.components_
失败:
AttributeError: 'Pipeline' object has no attribute 'components_'
Thanks!
谢谢!
(Also interested in extracting attributes like coef_
from learning pipelines.)
(也有兴趣coef_
从学习管道中提取属性。)
采纳答案by Andreas Mueller
Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.htmlI feel it is pretty clear.
你有没有看文档:http: //scikit-learn.org/dev/modules/pipeline.html我觉得很清楚。
Update: in 0.21 you can use just square brackets:
更新:在 0.21 中,您可以只使用方括号:
pipeline['pca']
or indices
或指数
pipeline[1]
There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:
有两种方法可以访问管道中的步骤,使用索引或使用您提供的字符串名称:
pipeline.named_steps['pca']
pipeline.steps[1][1]
This will give you the PCA object, on which you can get components.
With named_steps
you can also use attribute access with a .
which allows autocompletion:
这将为您提供 PCA 对象,您可以在该对象上获取组件。随着named_steps
你也可以使用带有属性的访问.
,它允许自动完成:
pipeline.names_steps.pca.
pipeline.names_steps.pca。
回答by Guillaume Chevalier
Using Neuraxle
使用神经轴
Working with pipelines is simpler using Neuraxle. For instance, you can do this:
使用Neuraxle可以更简单地使用管道。例如,您可以这样做:
from neuraxle.pipeline import Pipeline
# Create and fit the pipeline:
pipeline = Pipeline([
StandardScaler(),
PCA(n_components=2)
])
pipeline, X_t = pipeline.fit_transform(X)
# Get the components:
pca = pipeline[-1]
components = pca.components_
You can access your PCA these three different ways as wished:
您可以根据需要通过这三种不同的方式访问您的 PCA:
pipeline['PCA']
pipeline[-1]
pipeline[1]
pipeline['PCA']
pipeline[-1]
pipeline[1]
Neuraxleis a pipelining library built on top of scikit-learnto take pipelines to the next level. It allows easily managing spaces of hyperparameter distributions, nested pipelines, saving and reloading, REST API serving, and more. The whole thing is made to also use Deep Learning algorithms and to allow parallel computing.
Neuraxle是一个建立在scikit-learn之上的流水线库,用于将流水线提升到一个新的水平。它允许轻松管理超参数分布空间、嵌套管道、保存和重新加载、REST API 服务等。整个过程也使用深度学习算法并允许并行计算。
Nested pipelines:
嵌套管道:
You could have pipelines within pipelines as below.
您可以在管道内设置管道,如下所示。
# Create and fit the pipeline:
pipeline = Pipeline([
StandardScaler(),
Identity(),
Pipeline([
Identity(), # Note: an Identity step is a step that does nothing.
Identity(), # We use it here for demonstration purposes.
Identity(),
Pipeline([
Identity(),
PCA(n_components=2)
])
])
])
pipeline, X_t = pipeline.fit_transform(X)
Then you'd need to do this:
那么你需要这样做:
# Get the components:
pca = pipeline["Pipeline"]["Pipeline"][-1]
components = pca.components_