java 使用 utf-8 的 opencsv CSVWriter 似乎不适用于多种语言
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10136343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
opencsv CSVWriter using utf-8 doesn't seem to work for multiple languages
提问by user1213162
I have a very annoying encoding problem using opencsv. When I export a csv file, I set character type as 'UTF-8'.
我在使用 opencsv 时遇到了一个非常烦人的编码问题。当我导出 csv 文件时,我将字符类型设置为“UTF-8”。
CSVWriter writer = new CSVWriter(new OutputStreamWriter("D:/test.csv", "UTF-8"));
but when I open the csv file with Microsoft Office Excel 2007, it turns out that it has 'UTF-8 BOM'encoding?
但是当我使用 Microsoft Office Excel 2007 打开 csv 文件时,结果发现它具有“UTF-8 BOM”编码?
Once I save the file in Notepad and re-open, the file turns back to UTF-8 and all the letters in it appears fine. I think I've searched enough, but I haven't found any solution to prevent my file from turning into 'UTF-8 BOM'. any ideas, please?
一旦我将文件保存在记事本中并重新打开,文件就会变回 UTF-8,并且其中的所有字母都显示正常。我想我已经进行了足够的搜索,但我还没有找到任何解决方案来防止我的文件变成“UTF-8 BOM”。有什么想法吗?
回答by goodhyun
I suppose your file has a 'UTF-8 without BOM' encoding. You better feed BOM encoding to your file, even though it's not necessary in most cases, but only one obvious exception is when you deal with ms excel.
我想你的文件有一个 'UTF-8 without BOM' 编码。您最好将 BOM 编码提供给您的文件,尽管在大多数情况下这不是必需的,但只有一个明显的例外是当您处理 ms excel 时。
FileOutputStream os = new FileOutputStream(file);
os.write(0xef);
os.write(0xbb);
os.write(0xbf);
CSVWriter csvWrite = new CSVWriter(new OutputStreamWriter(os));
Now your file will be understood by excel as utf-8 csv.
现在您的文件将被 excel 理解为 utf-8 csv。
回答by Petr Abdulin
UTF-8
and UTF-8 Signature
(which incorrectly named sometimes as UTF-8 BOM
) are same encodings, and signature is used only to distinguish it from any other encodings. Any unicode application should process UTF-8 signature (which is three bytes sequence EF BB BF
) correctly.
UTF-8
和UTF-8 Signature
(有时被错误地命名为UTF-8 BOM
)是相同的编码,并且签名仅用于将其与任何其他编码区分开来。任何 unicode 应用程序都应该正确处理 UTF-8 签名(三个字节序列EF BB BF
)。
Why Java is specifically adds this signature and how to stop it doing that I don't know.
为什么 Java 专门添加了这个签名以及如何阻止它这样做,我不知道。