Python Sklearn 如何使用 Joblib 或 Pickle 保存从管道和 GridSearchCV 创建的模型?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34143829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?
提问by Jarad
After identifying the best parameters using a pipeline
and GridSearchCV
, how do I pickle
/joblib
this process to re-use later? I see how to do this when it's a single classifier...
使用 apipeline
和确定最佳参数后GridSearchCV
,我pickle
/joblib
此过程如何在以后重新使用?当它是单个分类器时,我知道如何执行此操作...
from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')
But how do I save this overall pipeline
with the best parameters after performing and completing a gridsearch
?
但是,如何pipeline
在执行和完成之后使用最佳参数保存整体gridsearch
?
I tried:
我试过:
joblib.dump(grid, 'output.pkl')
- But that dumped every gridsearch attempt (many files)joblib.dump(pipeline, 'output.pkl')
- But I don't think that contains the best parameters
joblib.dump(grid, 'output.pkl')
- 但是这放弃了每次 gridsearch 尝试(许多文件)joblib.dump(pipeline, 'output.pkl')
- 但我不认为包含最好的参数
X_train = df['Keyword']
y_train = df['Ad Group']
pipeline = Pipeline([
('tfidf', TfidfVectorizer()),
('sgd', SGDClassifier())
])
parameters = {'tfidf__ngram_range': [(1, 1), (1, 2)],
'tfidf__use_idf': (True, False),
'tfidf__max_df': [0.25, 0.5, 0.75, 1.0],
'tfidf__max_features': [10, 50, 100, 250, 500, 1000, None],
'tfidf__stop_words': ('english', None),
'tfidf__smooth_idf': (True, False),
'tfidf__norm': ('l1', 'l2', None),
}
grid = GridSearchCV(pipeline, parameters, cv=2, verbose=1)
grid.fit(X_train, y_train)
#These were the best combination of tuning parameters discovered
##best_params = {'tfidf__max_features': None, 'tfidf__use_idf': False,
## 'tfidf__smooth_idf': False, 'tfidf__ngram_range': (1, 2),
## 'tfidf__max_df': 1.0, 'tfidf__stop_words': 'english',
## 'tfidf__norm': 'l2'}
采纳答案by Ibraim Ganiev
from sklearn.externals import joblib
joblib.dump(grid.best_estimator_, 'filename.pkl')
If you want to dump your object into one file - use:
如果要将对象转储到一个文件中 - 使用:
joblib.dump(grid.best_estimator_, 'filename.pkl', compress = 1)