Python 在 sklearn 中保存 MinMaxScaler 模型

Question

提问by Luis Ramon Ramirez Rodriguez

I'm using the MinMaxScalermodel in sklearn to normalize the features of a model.

我MinMaxScaler在 sklearn 中使用模型来规范化模型的特征。

training_set = np.random.rand(4,4)*10
training_set

       [[ 6.01144787,  0.59753007,  2.0014852 ,  3.45433657],
       [ 6.03041646,  5.15589559,  6.64992437,  2.63440202],
       [ 2.27733136,  9.29927394,  0.03718093,  7.7679183 ],
       [ 9.86934288,  7.59003904,  6.02363739,  2.78294206]]


scaler = MinMaxScaler()
scaler.fit(training_set)    
scaler.transform(training_set)


   [[ 0.49184811,  0.        ,  0.29704831,  0.15972182],
   [ 0.4943466 ,  0.52384506,  1.        ,  0.        ],
   [ 0.        ,  1.        ,  0.        ,  1.        ],
   [ 1.        ,  0.80357559,  0.9052909 ,  0.02893534]]

Now I want to use the same scaler to normalize the test set:

现在我想使用相同的缩放器来规范化测试集：

   [[ 8.31263467,  7.99782295,  0.02031658,  9.43249727],
   [ 1.03761228,  9.53173021,  5.99539478,  4.81456067],
   [ 0.19715961,  5.97702519,  0.53347403,  5.58747666],
   [ 9.67505429,  2.76225253,  7.39944931,  8.46746594]]

But I don't want so use the scaler.fit()with the training data all the time. Is there a way to save the scaler and load it later from a different file?

但我不想一直使用scaler.fit()训练数据。有没有办法保存定标器并稍后从不同的文件加载它？

Answer 1

采纳答案by jlarks32

So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickleand sklearn.externals.joblibare going to be your friends here.

所以我实际上不是这方面的专家，但通过一些研究和一些有用的链接，我认为pickle并且sklearn.externals.joblib将成为您的朋友。

The package picklelets you save models or "dump" models to a file.

该软件包pickle可让您将模型或“转储”模型保存到文件中。

I think this linkis also helpful. It talks about creating a persistence model. Something that you're going to want to try is:

我认为这个链接也很有帮助。它讨论了创建持久性模型。您想要尝试的是：

# could use: import pickle... however let's do something else
from sklearn.externals import joblib 

# this is more efficient than pickle for things like large numpy arrays
# ... which sklearn models often have.   

# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl')

Hereis where you can learn more about the sklearn externals.

您可以在此处了解有关 sklearn 外部组件的更多信息。

Let me know if that doesn't help or I'm not understanding something about your model.

如果这没有帮助，或者我不了解您的模型，请告诉我。

Note: sklearn.externals.joblibis deprecated. Install and use the pure joblibinstead

注意：sklearn.externals.joblib已弃用。安装并使用 purejoblib代替

Answer 2

回答by Ivan Vegner

Even better than pickle(which creates much larger files than this method), you can use sklearn's built-in tool:

甚至比pickle（创建比此方法大得多的文件）更好，您可以使用sklearn的内置工具：

from sklearn.externals import joblib
scaler_filename = "scaler.save"
joblib.dump(scaler, scaler_filename) 

# And now to load...

scaler = joblib.load(scaler_filename)

Note: sklearn.externals.joblibis deprecated. Install and use the pure joblibinstead

注意：sklearn.externals.joblib已弃用。安装并使用 purejoblib代替

Answer 3

回答by Engineero

Just a note that sklearn.externals.joblibhas been deprecated and is superseded by plain old joblib, which can be installed with pip install joblib:

只是一个sklearn.externals.joblib已被弃用并被普通 old 取代的注释，joblib可以安装pip install joblib：

import joblib
joblib.dump(my_scaler, 'scaler.gz')
my_scaler = joblib.load('scaler.gz')

Note that file extensions can be anything, but if it is one of ['.z', '.gz', '.bz2', '.xz', '.lzma']then the corresponding compression protocol will be used. Docs for joblib.dump()and joblib.load()methods.

请注意，文件扩展名可以是任何内容，但如果是其中之一，['.z', '.gz', '.bz2', '.xz', '.lzma']则将使用相应的压缩协议。文档joblib.dump()和joblib.load()方法。

Answer 4

回答by Psidom

You can use pickle, to save the scaler:

您可以使用pickle, 来保存缩放器：

import pickle
scalerfile = 'scaler.sav'
pickle.dump(scaler, open(scalerfile, 'wb'))

Load it back:

加载回来：

import pickle
scalerfile = 'scaler.sav'
scaler = pickle.load(open(scalerfile, 'rb'))
test_scaled_set = scaler.transform(test_set)

Answer 5

回答by PSN

The best way to do this is to create an ML pipeline like the following:

执行此操作的最佳方法是创建一个如下所示的 ML 管道：

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib


pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() )

model = pipeline.fit(X_train, y_train)

Now you can save it to a file:

现在您可以将其保存到文件中：

joblib.dump(model, 'filename.mod')

Later you can load it like this:

稍后您可以像这样加载它：

model = joblib.load('filename.mod')

Python 在 sklearn 中保存 MinMaxScaler 模型

提问by Luis Ramon Ramirez Rodriguez

采纳答案by jlarks32

回答by Ivan Vegner

回答by Engineero

回答by Psidom

回答by PSN

The best way to do this is to create an ML pipeline like the following:

执行此操作的最佳方法是创建一个如下所示的 ML 管道：

Now you can save it to a file:

现在您可以将其保存到文件中：

Later you can load it like this:

稍后您可以像这样加载它：

相关推荐

最近更新

标签

Python 在 sklearn 中保存 MinMaxScaler 模型

提问by Luis Ramon Ramirez Rodriguez

采纳答案by jlarks32

回答by Ivan Vegner

回答by Engineero

回答by Psidom

回答by PSN

The best way to do this is to create an ML pipeline like the following:

执行此操作的最佳方法是创建一个如下所示的 ML 管道：

Now you can save it to a file:

现在您可以将其保存到文件中：

Later you can load it like this:

稍后您可以像这样加载它：

相关推荐

Errno 13 权限被拒绝 Python

我如何在python中制作网格？

Python 在 Windows 10 上安装 dlib

Python 类型错误：“元组”和“str”的实例之间不支持“<”

相关推荐

最近更新

标签