Java 读取编码错误的文件。CP1252 与 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19360843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading file with bad encoding. CP1252 vs UTF-8
提问by Evgeny Mironenko
I have byte array, which put in InputStreamReader and do some manipulations with it.
我有字节数组,它放入 InputStreamReader 并对其进行一些操作。
Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr));
JVM has default cp1252 encoding, but file, which I translating to byte array has utf-8 encoding. Also this file has german umlauts. And when I put byte array in InputStreamReader, java decode umlauts to wrong symbols. For example ü represent as ??. I'm tried to put "UTF-8" and Charset.forName("UTF-8").newDecoder()); to InputStreamReader constructor, translate strings from reader to string with new encoding via new String(oldStr.getBytes("cp1252"), "UTF-8); but it's not helped. In debugger in reader variable I see StreamDecoder parameter, which has "decoder" with MS1252$Decoder value. Maybe It's solving of my problem, but I not understand, how I can fix it.
JVM 具有默认的 cp1252 编码,但我将其转换为字节数组的文件具有 utf-8 编码。这个文件也有德语变音。当我将字节数组放入 InputStreamReader 时,java 将变音符号解码为错误的符号。例如ü代表为??。我试图把 "UTF-8" 和 Charset.forName("UTF-8").newDecoder()); 到 InputStreamReader 构造函数,通过 new String(oldStr.getBytes("cp1252"), "UTF-8); 将字符串从读取器转换为具有新编码的字符串,但这没有帮助。在读取器变量中的调试器中,我看到 StreamDecoder 参数,它具有“解码器”与 MS1252$Decoder 值。也许它解决了我的问题,但我不明白,我该如何解决它。
采纳答案by Pavlo K.
Try to use InputStreamReader(InputStream in, String charsetName)
constructor and set charset by yourself.
尝试使用InputStreamReader(InputStream in, String charsetName)
构造函数并自己设置字符集。
Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr), "UTF-8");
回答by mcflyfr
I had exactly the same error and finally solved the issue by adding this to the JVM startup options :
我遇到了完全相同的错误,最终通过将其添加到 JVM 启动选项中解决了该问题:
-Dfile.encoding=UTF8