Python scikit-learn:导出训练好的分类器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17511968/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:23:28  来源:igfitidea点击:

Python scikit-learn: exporting trained classifier

pythonscikit-learn

提问by jcdmb

I am using a DBN (deep belief network) from nolearnbased on scikit-learn.

我正在使用基于 scikit-learn 的nolearn的 DBN(深度信念网络)。

I have already built a Network which can classify my data very well, now I am interested in exporting the model for deployment, but I don't know how (I am training the DBN every time I want to predict something). In matlabI would just export the weight matrix and import it in another machine.

我已经建立了一个可以很好地对我的数据进行分类的网络,现在我对导出模型进行部署感兴趣,但我不知道如何(每次我想预测某事时我都在训练 DBN)。在matlab我只会导出权重矩阵并将其导入另一台机器。

Does someone know how to export the model/the weight matrix to be imported without needing to train the whole model again?

有人知道如何导出模型/要导入的权重矩阵而无需再次训练整个模型吗?

采纳答案by ogrisel

First, install joblib.

首先,安装joblib

You can use:

您可以使用:

>>> import joblib
>>> joblib.dump(clf, 'my_model.pkl', compress=9)

And then later, on the prediction server:

然后,在预测服务器上:

>>> import joblib
>>> model_clone = joblib.load('my_model.pkl')

This is basically a Python pickle with an optimized handling for large numpy arrays. It has the same limitations as the regular pickle w.r.t. code change: if the class structure of the pickle object changes you might no longer be able to unpickle the object with new versions of nolearn or scikit-learn.

这基本上是一个 Python pickle,对大型 numpy 数组进行了优化处理。它与常规 pickle wrt 代码更改具有相同的限制:如果 pickle 对象的类结构发生变化,您可能不再能够使用新版本的 nolearn 或 scikit-learn 解开对象。

If you want long-term robust way of storing your model parameters you might need to write your own IO layer (e.g. using binary format serialization tools such as protocol buffers or avro or an inefficient yet portable text / json / xml representation such as PMML).

如果您想要长期稳健的存储模型参数的方式,您可能需要编写自己的 IO 层(例如,使用二进制格式序列化工具,如协议缓冲区或 avro 或低效但可移植的文本/json/xml 表示,如PMML) .

回答by Franck Dernoncourt

The section 3.4. Model persistencein scikit-learn documentation covers pretty much everything.

3.4scikit-learn 文档中的模型持久性几乎涵盖了所有内容。

In addition to sklearn.externals.joblibogrisel pointed to, it shows how to use the regular pickle package:

除了sklearn.externals.joblibogrisel 所指的之外,它还展示了如何使用常规的 pickle 包:

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0])
array([0])
>>> y[0]
0

and gives a few warnings such as models saved in one version of scikit-learn might not load in another version.

并给出一些警告,例如保存在一个版本的 scikit-learn 中的模型可能无法加载到另一个版本中。

回答by ben26941

Pickling/unpickling has the disadvantage that it only works with matching python versions (major and possibly also minor versions) and sklearn, joblib library versions.

Pickling/unpickling 的缺点是它只适用于匹配的 python 版本(主要和可能还有次要版本)和 sklearn、joblib 库版本。

There are alternative descriptive output formats for machine learning models, such as developed by the Data Mining Group, such as the predictive models markup language (PMML) and the portable format for analytics (PFA). Of the two, PMML is much better supported.

机器学习模型有替代的描述性输出格式,例如由Data Mining Group开发的,例如预测模型标记语言 (PMML) 和便携式分析格式 (PFA)。在这两者中,更好地支持PMML 。

So you have the option of saving a model from scikit-learn into PMML (for example using sklearn2pmml), and then deploy and run it in java, spark, or hive using jpmml(of course you have more choices).

所以,你必须保存从模型的选择scikit学习到PMML(例如使用sklearn2pmml),然后部署,并在Java中,火花,或蜂房使用运行jpmml(当然,你有更多的选择)。