Java Weka:如何指定拆分百分比?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14682057/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Weka: How to specify split percentage?
提问by rishi
I have written the code to create the model and save it. It works fine. My understanding is data, by default, is split in 10 folds. I want data to be split into two sets (training and testing) when I create the model. On Weka UI, I can do it by using "Percentage split" radio button. I want to know how to do it through code. I want it to be split in two parts 80% being the training and 20% being the testing. Here is my code.
我已经编写了代码来创建模型并保存它。它工作正常。我的理解是,默认情况下,数据被分成 10 折。我希望在创建模型时将数据分成两组(训练和测试)。在 Weka UI 上,我可以使用“百分比拆分”单选按钮来完成。我想知道如何通过代码来做到这一点。我希望它分为两部分,80% 是训练,20% 是测试。这是我的代码。
FilteredClassifier model = new FilteredClassifier();
model.setFilter(new StringToWordVector());
model.setClassifier(new NaiveBayesMultinomial());
try {
model.buildClassifier(trainingSet);
} catch (Exception e1) { // TODO Auto-generated catch block
e1.printStackTrace();
}
ObjectOutputStream oos = new ObjectOutputStream(
new FileOutputStream(
"/Users/me/models/MyModel.model"));
oos.writeObject(model);
oos.flush();
oos.close();
trainingSet here is already populated Instances object. Can someone help me with this?
此处的 trainingSet 已填充 Instances 对象。有人可以帮我弄这个吗?
Thanks in advance!
提前致谢!
回答by Jan Eglinger
In the UI class ClassifierPanel
's method startClassifier()
, I found the following code:
在 UI 类ClassifierPanel
的方法中startClassifier()
,我找到了以下代码:
// Percent split
int trainSize = (int) Math.round(inst.numInstances() * percent
/ 100);
int testSize = inst.numInstances() - trainSize;
Instances train = new Instances(inst, 0, trainSize);
Instances test = new Instances(inst, trainSize, testSize);
so after randomizing your dataset...
所以在随机化你的数据集之后......
trainingSet.randomize(new java.util.Random(0));
... I suggest you split your trainingSet
in the same way:
...我建议你trainingSet
以同样的方式拆分你的:
int trainSize = (int) Math.round(trainingSet.numInstances() * 0.8);
int testSize = trainingSet.numInstances() - trainSize;
Instances train = new Instances(trainingSet, 0, trainSize);
Instances test = new Instances(trainingSet, trainSize, testSize);
then use Classifier#buildClassifier(Instances data)
to train the classifier with 80% of your set instances:
然后使用Classifier#buildClassifier(Instances data)
80% 的集合实例训练分类器:
model.buildClassifier(train);
UPDATE: thanks to @ChengkunWu's answer, I added the randomizing step above.
更新:感谢@ChengkunWu 的回答,我在上面添加了随机化步骤。
回答by Chengkun Wu
You might also want to randomize the split as well.
您可能还想随机化拆分。
data.randomize(new java.util.Random(0));