在 java 和 csv 文件中设置 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4192186/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
setting a UTF-8 in java and csv file
提问by mehdi
I am using this code for add Persian words to a csv file via OpenCSV:
我正在使用此代码通过OpenCSV将波斯语单词添加到 csv 文件:
String[] entries="\u0645 \u062E\u062F\u0627".split("#");
try{
CSVWriter writer=new CSVWriter(new OutputStreamWriter(new FileOutputStream("C:\test.csv"), "UTF-8"));
writer.writeNext(entries);
writer.close();
}
catch(IOException ioe){
ioe.printStackTrace();
}
When I open the resulting csv file, in Excel, it contains "?????". Other programs such as notepad.exe don't have this problem, but all of my users are using MS Excel.
当我在 Excel 中打开生成的 csv 文件时,它包含“?????” . 其他程序如 notepad.exe 没有这个问题,但我所有的用户都使用 MS Excel。
Replacing OpenCSV with SuperCSVdoes not solve this problem.
用 SuperCSV 替换OpenCSV并不能解决这个问题。
When I typed Persian characters into csv file manually, I don't have any problems.
当我手动将波斯字符输入到 csv 文件中时,我没有任何问题。
采纳答案by Michael Borgwardt
Unfortunately, CSV is a very ad hoc format with no metadata and no real standard that would mandate a flexible encoding. As long as you use CSV, you can't reliably use any characters outside of ASCII.
不幸的是,CSV 是一种非常临时的格式,没有元数据,也没有要求灵活编码的真正标准。只要您使用 CSV,就不能可靠地使用 ASCII 之外的任何字符。
Your alternatives:
您的选择:
- Write to XML (which does have encoding metadata if you do it right) and have the users import the XMLinto Excel.
- Use Apache POIto create actual Excel documents.
- 写入 XML(如果操作正确,它确实具有编码元数据)并让用户将 XML导入 Excel。
- 使用Apache POI创建实际的 Excel 文档。
回答by chkal
Excel doesn't use UTF8
to open CSV files. Thats a known problem. The actual encoding used depends on the locale settings of Microsoft Windows. With a German lcoale for example Excel would open a CSV file with CP1252
.
Excel 不UTF8
用于打开 CSV 文件。这是一个已知的问题。实际使用的编码取决于 Microsoft Windows 的区域设置。例如,使用德语 lcoale,Excel 将打开一个 CSV 文件,扩展名为CP1252
.
You could create an Excel file containing some persian characters and save it as an CSV file. Then write a small Java program to read this file and test some common encodings. Thats the way I used to figure out the correct encoding for German umlauts in CSV files.
您可以创建一个包含一些波斯字符的 Excel 文件并将其另存为 CSV 文件。然后编写一个小的Java程序来读取这个文件并测试一些常见的编码。这就是我用来找出 CSV 文件中德语变音符号的正确编码的方式。
回答by AlexR
I spent some time but found solution for your problem.
我花了一些时间,但找到了解决您的问题的方法。
First I opened notepad and wrote the following line: ????, hello, привет Then I saved it as file he-en-ru.csv using UTF-8. Then I opened it with MS excel and everything worked well.
首先,我打开记事本并写下以下行:????, hello, привет 然后我使用 UTF-8 将其保存为文件 he-en-ru.csv。然后我用 MS excel 打开它,一切正常。
Now, I wrote a simple java program that prints this line to file as following:
现在,我编写了一个简单的 java 程序,将这一行打印到文件中,如下所示:
PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
w.print(line);
w.flush();
w.close();
When I opened this file using excel I saw "gibrish."
当我使用 excel 打开这个文件时,我看到了“乱码”。
Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:
然后我尝试读取 2 个文件的内容,并且(如预期的那样)看到记事本生成的文件包含 3 个字节的前缀:
239 EF
187 BB
191 BF
So, I modified my code to print this prefix first and the text after that:
所以,我修改了我的代码,先打印这个前缀,然后打印文本:
String line = "????, hello, привет";
OutputStream os = new FileOutputStream("c:/temp/j.csv");
os.write(239);
os.write(187);
os.write(191);
PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
w.print(line);
w.flush();
w.close();
And it worked! I opened the file using excel and saw text as I expected.
它奏效了!我使用 excel 打开文件并看到了我预期的文本。
Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in 'UTF-8 with BOM' (otherwise it is just 'UTF-8 without BOM').
底线:在写入内容之前写入这 3 个字节。此前缀表示内容为“带BOM的 UTF-8 ”(否则它只是“不带 BOM 的 UTF-8”)。