无法上传 CSV 文件进行 WEKA 分析 - java
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18820264/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
unable to upload CSV file for WEKA analysis - java
提问by pret
I am working on a big data analysis project and i am stuck at this point I am trying to upload a CSV file with data and want to use WEKA java API to perform the analysis. I am looking to tokenize the text, remove stop words, identify pos and filter the nouns I have no idea why I am seeing this error. Explanation and Solution for this would be great ! But i see the below error
我正在做一个大数据分析项目,但我被困在这一点上,我正在尝试上传一个包含数据的 CSV 文件,并想使用 WEKA java API 来执行分析。我正在寻找标记文本,删除停用词,识别 pos 并过滤名词我不知道为什么我会看到这个错误。对此的解释和解决方案会很棒!但我看到以下错误
Error:
Exception in thread "main" java.io.IOException: wrong number of values. Read 21, expected 20, read Token[EOL], line 3
at weka.core.converters.ConverterUtils.errms(ConverterUtils.java:912)
at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:819)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:642)
Code :
代码 :
CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\fakepath\CSVfilesample.csv"));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\fakepath\CSVfilesample.arff"));
saver.setDestination(new File("C:\fakepath\CSVfilesample.arff"));
saver.writeBatch();
BufferedReader br=null;
br=new BufferedReader(new FileReader("C:\fakepath\CSVfilesample.arff"));
Instances train=new Instances(br);
train.setClassIndex(train.numAttributes()-1);
br.close();
NaiveBayes nb=new NaiveBayes();
nb.buildClassifier(train);
Evaluation eval=new Evaluation(train);
eval.crossValidateModel(nb, train, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n=====\n",true));
System.out.println(eval.fMeasure(1)+" "+eval.precision(1)+" "+eval.recall(1));
采纳答案by user2339071
This error is generally caused by incorrect format while loading a certain ARFF
file. There a few reasons. Check the following points:
此错误通常是由于加载某个ARFF
文件时格式不正确引起的。有几个原因。检查以下几点:
- It is practice to use
ARFF
format instead of a CSV because it has certain advantages over a CSV file. Check Can I use CSV.? - Now for the other part, check if the encoding of the file is UTF-8. If it is you will have to decode the file using UTF 8 format. Refernces : Text Categorization with WEKA
- Thirdly check if there are some incompatible characters in your CSV. Like a
%2
or something like that. Check for syntactically incorrect endings. Check for any extra commas.
- 使用
ARFF
格式而不是 CSV是一种实践,因为它比 CSV 文件具有某些优势。检查我可以使用 CSV。? - 现在对于另一部分,检查文件的编码是否为 UTF-8。如果是,则必须使用 UTF 8 格式解码文件。参考文献:使用 WEKA 进行文本分类
- 第三,检查您的 CSV 文件中是否存在一些不兼容的字符。像一个
%2
或类似的东西。检查语法错误的结尾。检查是否有多余的逗号。
This error tells you that there is problem with the file contents. They don't follow WEKA standard format. Fix that and the error will disappear.
此错误告诉您文件内容有问题。它们不遵循 WEKA 标准格式。修复它,错误就会消失。
Hope it helps. :)
希望能帮助到你。:)