Java 将 ISO-8859-1 转换为 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22017774/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 13:05:54  来源:igfitidea点击:

Java convert ISO-8859-1 to UTF-8

javautf-8

提问by user3351706

I have a properties file with Asian translations in it, which I believe is saved as ISO-8859-1. I'm trying to convert them to UTF-8. So è-|?:would equal 警告:

我有一个带有亚洲翻译的属性文件,我相信它被保存为 ISO-8859-1。我正在尝试将它们转换为 UTF-8。所以è-|?:将等于警告:

I've tried several methods listed on this site as well as some other sites but have had no luck.

我已经尝试了本网站以及其他一些网站上列出的几种方法,但都没有运气。

byte[] isoBytes = line.getBytes("ISO-8859-1");
byte[] utf8 = new String(isoBytes, "ISO-8859-1").getBytes("UTF-8");


CharBuffer charBuf = null;
Charset isocharset = Charset.forName("iso-8859-1");
CharsetDecoder isoDecoder = Charset.forName("iso-8859-1").newDecoder();
CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
byte sByte[] = line.getBytes("iso-8859-1");
charBuf = utf8Decoder.decode(isoBuf);

What is the easiest way to convert è-|?:to 警告:?

什么是转换的最简单的方法è-|?:警告:

Thank You Rich

谢谢你有钱

@Pshemo had the answer I was looking for

@Pshemo 得到了我正在寻找的答案

byte[] isoBytes = line.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));

Thank you all for your help

谢谢大家的帮助

回答by Thomas

The easiest and safest way would be to save the file as UTF-8 and read it as such.

最简单和最安全的方法是将文件另存为 UTF-8 并按原样读取。

Most likely the answers you found here already also stated that ISO Latin-1 (ISO-8859-1) can't store all the code points that UTF-8 can handle (especially asian characters), thus storing properties (text resources?) as ISO Latin-1 will result in losses.

您在这里找到的答案很可能已经说明 ISO Latin-1 (ISO-8859-1) 无法存储 UTF-8 可以处理的所有代码点(尤其是亚洲字符),因此存储属性(文本资源?)因为 ISO Latin-1 会导致损失。

Thus either store it as UTF-8 or use unicode code points, e.g. \u8b66\u544afor 警告(Warning:) ).

因此,要么将其存储为 UTF-8,要么使用 unicode 代码点,例如\u8b66\u544afor 警告( Warning:))。

回答by Joop Eggen

In fact displaying UTF-8 content would yield in ISO-8859-1: è-|? (plus something). So that is fine.

事实上,在 ISO-8859-1 中显示 UTF-8 内容会产生:è-|? (加上一些东西)。所以没关系。

So the file is in UTF-8. The JDK has the tool native2asciito convert and unconvert to u-escaping non-ASCII characters to \uXXXX.

所以文件是UTF-8。JDK 具有native2ascii将 u 转义非 ASCII 字符转换和取消转换为\uXXXX.

native2ascii -encoding UTF-8 old.properties new.properties

Use a programmer's editor like JEdit or Notepad++ to be sure of encodings.

使用像 JEdit 或 Notepad++ 这样的程序员编辑器来确保编码。

回答by Gustavo

This worked for me:

这对我有用:

@Pshemo had the answer I was looking for

@Pshemo 得到了我正在寻找的答案

byte[] isoBytes = line.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));