Java、Weka:如何预测数字属性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16223044/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java, Weka: How to predict numeric attribute?
提问by Anton Ashanin
I was trying to use NaiveBayesUpdateable classifier from Weka. My data contains both nominal and numeric attributes:
我试图使用来自 Weka 的 NaiveBayesUpdateable 分类器。我的数据包含名义和数字属性:
@relation cars
@attribute country {FR, UK, ...}
@attribute city {London, Paris, ...}
@attribute car_make {Toyota, BMW, ...}
@attribute price numeric %% car price
@attribute sales numeric %% number of cars sold
I need to predict the number of sales (numeric!) based on other attributes.
我需要根据其他属性预测销售数量(数字!)。
I understand that I can not use numeric attribute for Bayes classification in Weka. One technique is to split value of numeric attribute in N intervals of length k and use instead nominal attribute, where n is a class name, like this: @attribute class {1,2,3,...N}.
我知道我不能在 Weka 中使用数字属性进行贝叶斯分类。一种技术是将数值属性的值拆分为 N 个长度为 k 的间隔,并使用名义属性代替,其中 n 是类名,如下所示:@attribute class {1,2,3,...N}。
Yet numeric attribute that I need to predict ranges from 0 to 1 000 000. Creating 1 000 000 classes make no sense at all. How to predict numeric attribute with Weka or what algorithms to look for in case Weka has no tools for this task?
然而,我需要预测的数字属性范围从 0 到 1 000 000。创建 1 000 000 个类根本没有意义。如何使用 Weka 预测数字属性或在 Weka 没有用于此任务的工具的情况下寻找什么算法?
回答by Sentry
What you want to do is regression, not classification. The difference is exactly what you describe/want:
您要做的是回归,而不是分类。区别正是您所描述/想要的:
- Classificationhas discrete classes/labels, any nominal attribute could be used as class here
- Regressionhas continuous labels, classes would be a wrong term here.
- 分类具有离散的类/标签,这里可以将任何名义属性用作类
- 回归具有连续的标签,类在这里是一个错误的术语。
Most regression based techniques can be transformed into a binary classification by defining a threshold and the class is determined by whether the predicted value is above or below this threshold.
大多数基于回归的技术可以通过定义阈值转换为二元分类,并且类别由预测值是高于还是低于该阈值来确定。
I don't know all of WEKA's classifiers that offer regression, but you can start by looking at those two:
我不知道所有提供回归的 WEKA 分类器,但您可以从这两个分类器开始:
- MultilayerPerceptron: Basically a neural network.
- LinearRegression: As the name says, linear regression.
- MultilayerPerceptron:基本上是一个神经网络。
- LinearRegression:顾名思义,线性回归。
You might have to use the NominalToBinary
filter to convert your nominal attributes to numerical (binary) ones.
您可能必须使用NominalToBinary
过滤器将名义属性转换为数字(二进制)属性。
回答by Bilal Dadanlar
you can find use regression in weka classifiers > functions > linear regression. here is an example of creating a regression model in weka https://www.ibm.com/developerworks/opensource/library/os-weka1/
你可以在weka分类器>函数>线性回归中找到使用回归。这是在 weka https://www.ibm.com/developerworks/opensource/library/os-weka1/中创建回归模型的示例
回答by demongolem
These days, I believe first introduced in Weka 3.7, RandomForest would work just as you want it. The features can be a mix of nominal and numeric and the prediction is allowed to be numeric as well.
现在,我相信在 Weka 3.7 中首次引入,RandomForest 会按照您的意愿工作。特征可以是名义和数字的混合,并且预测也可以是数字的。
The drawback (I would imagine in your case) is that it is not an Updateable class as NaiveBayesUpdateable works well with large amounts of data that may not fit in memory all at once.
缺点(我想在你的情况下)是它不是一个 Updateable 类,因为 NaiveBayesUpdateable 可以很好地处理大量可能无法一次性全部放入内存的数据。