Python 训练 SVM 分类器需要多长时间？

Question

提问by Il'ya Zhenin

I wrote following code and test it on small data:

我编写了以下代码并在小数据上进行测试：

classif = OneVsRestClassifier(svm.SVC(kernel='rbf'))
classif.fit(X, y)

Where X, y(X - 30000x784 matrix, y - 30000x1) are numpy arrays. On small data algorithm works well and give me right results.

其中X, y(X - 30000x784 矩阵，y - 30000x1) 是 numpy 数组。在小数据算法上运行良好，并给我正确的结果。

But I run my program about 10 hours ago... And it is still in process.

但是我大约在 10 小时前运行了我的程序......它仍在进行中。

I want to know how long it will take, or it stuck in some way? (Laptop specs 4 GB Memory, Core i5-480M)

我想知道需要多长时间，或者它以某种方式卡住了？（笔记本电脑规格 4 GB 内存，Core i5-480M）

Answer 1

回答by lejlot

SVM training can be arbitrary long, this depends on dozens of parameters:

SVM 训练可以任意长，这取决于几十个参数：

Cparameter - greater the missclassification penalty, slower the process
kernel - more complicated the kernel, slower the process (rbf is the most complex from the predefined ones)
data size/dimensionality - again, the same rule

C参数 - 错误分类惩罚越大，过程越慢
内核 - 内核越复杂，进程越慢（rbf 是预定义内核中最复杂的）
数据大小/维度——同样的规则

in general, basic SMO algorithm is O(n^3), so in case of 30 000datapoints it has to run number of operations proportional to the2 700 000 000 000which is realy huge number. What are your options?

一般来说，基本的 SMO 算法是O(n^3)，所以在30 000数据点的情况下，它必须运行与成比例的操作数量，2 700 000 000 000这确实是一个巨大的数字。你有哪些选择？

change a kernel to the linear one, 784 features is quite a lot, rbf can be redundant
reduce features' dimensionality (PCA?)
lower the Cparameter
train model on the subset of your data to find the good parameters and then train the whole one on some cluster/supercomputer

将内核改成线性内核，784个特征相当多，rbf可以是多余的
减少特征的维数（PCA？）
降低C参数
在数据子集上训练模型以找到好的参数，然后在某个集群/超级计算机上训练整个模型

Python 训练 SVM 分类器需要多长时间？

提问by Il'ya Zhenin

回答by lejlot

相关推荐

最近更新

标签

Python 训练 SVM 分类器需要多长时间？

提问by Il'ya Zhenin

回答by lejlot

相关推荐

Python Openpyxl 不会以只读模式关闭 Excel 工作簿

Python Django Celery - 无法连接到 amqp://[email protected]:5672//

Python 导入错误：无法导入名称换行

Python 使用 BeautifulSoup 从未关闭的特定元标记中提取内容

相关推荐

最近更新

标签