Java 使用 OpenCSV 解析包含 Unicode 字符的 CSV 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1695699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse CSV file containing a Unicode character using OpenCSV
提问by meysam_pro
I'm trying to parse a .csv file with OpenCSVin NetBeans 6.0.1. My file contains some Unicode character. When I write it in output the character appears in other form, like (HJ1'-E/;). When when I open this file in Notepad, it looks ok.
我正在尝试在 NetBeans 6.0.1 中使用OpenCSV解析 .csv 文件。我的文件包含一些 Unicode 字符。当我在输出中写入时,该字符以其他形式出现,例如 (HJ1'-E/;)。当我在记事本中打开这个文件时,它看起来没问题。
The code that I used:
我使用的代码:
CSVReader reader=new CSVReader(new FileReader("d:\a.csv"),',','\'',1);
String[] line;
while((line=reader.readNext())!=null){
StringBuilder stb=new StringBuilder(400);
for(int i=0;i<line.length;i++){
stb.append(line[i]);
stb.append(";");
}
System.out.println( stb);
}
采纳答案by Jon Skeet
First you need to know what encoding your file is in, such as UTF-8 or UTF-16. What's generating this file to start with?
首先,您需要知道您的文件采用什么编码,例如 UTF-8 或 UTF-16。生成这个文件的开始是什么?
After that, it's relatively straightforward - you need to create a FileInputStream
wrapped in an InputStreamReader
instead of just a FileReader
. (FileReader
always uses the default encoding for the system.) Specify the encoding to use when you create the InputStreamReader
, and if you've picked the right one, everything should start working.
之后,它相对简单 - 您需要创建一个FileInputStream
包装在.InputStreamReader
而不仅仅是FileReader
. (FileReader
始终使用系统的默认编码。)指定创建 时要使用的编码InputStreamReader
,如果您选择了正确的编码,一切都应该开始工作。
Note that you don't need to use OpenCSV to check this - you could just read the text of the file yourself and print it all out. I'm not sure I'd trust System.out
to be able to handle non-ASCII characters though - you may want to find a different way of examining strings, such as printing out the individual values of characters as integers (preferably in hex) and then comparing them with the charts at unicode.org. On the other hand, you could try the right encoding and see what happens to start with...
请注意,您不需要使用 OpenCSV 来检查这一点 - 您可以自己阅读文件的文本并将其全部打印出来。我不确定我是否相信System.out
能够处理非 ASCII 字符 - 您可能想找到一种不同的方法来检查字符串,例如将字符的各个值打印为整数(最好是十六进制),然后将它们与unicode.org上的图表进行比较。另一方面,你可以尝试正确的编码,看看会发生什么......
EDIT: Okay, so if you're using UTF-8:
编辑:好的,所以如果您使用的是 UTF-8:
CSVReader reader=new CSVReader(
new InputStreamReader(new FileInputStream("d:\a.csv"), "UTF-8"),
',', '\'', 1);
String[] line;
while ((line = reader.readNext()) != null) {
StringBuilder stb = new StringBuilder(400);
for (int i = 0; i < line.length; i++) {
stb.append(line[i]);
stb.append(";");
}
System.out.println(stb);
}
(I hope you have a try/finally block to close the file in your real code.)
(我希望你有一个 try/finally 块来关闭你真实代码中的文件。)