java 将 UTF-8 编码的字符串转换为人类可读的字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15019587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert UTF-8 encoded string to human readable string
提问by pradeep
How to convert any UTF8 strings to readable strings.
如何将任何 UTF8 字符串转换为可读字符串。
Like : a? (in UTF8) is
像一个?(在 UTF8 中)是
I tried using Charset but not working.
我尝试使用 Charset 但不起作用。
回答by jdb
You are encoding a string to ISO-8859-15 with byte[] b = "üü???ABC".getBytes("ISO-8859-15");
then you are decoding it with UTF-8 System.out.println(new String(b, "UTF-8"));
. You have to decode it the same way with ISO-8859-15.
您将字符串编码为 ISO-8859-15 ,byte[] b = "üü???ABC".getBytes("ISO-8859-15");
然后使用 UTF-8 对其进行解码System.out.println(new String(b, "UTF-8"));
。您必须以与 ISO-8859-15 相同的方式对其进行解码。
回答by Esailija
This is not "UTF-8" but completely broken and unrepairabledata. Strings do not have encodings. It makes no sense to say "UTF-8" string in this context. String is a string of abstract characters - it doesn't have any encodings except as an internal implementation detail that is not our concern and not related to your problem.
这不是“UTF-8”而是完全损坏且无法修复的数据。字符串没有编码。在这种情况下说“UTF-8”字符串是没有意义的。String 是一串抽象字符 - 它没有任何编码,除了作为我们不关心且与您的问题无关的内部实现细节。
回答by Grim
A string in java is already an unicode representation. When you call one of the getBytesmethods on it you get an encoded representation (as bytes, thus binary values) in a specific encoding - ISO-8859-15 in your example. If you want to convert this byte array back to an unicode string you can do that with one of the string constructors accepting a byte array, like you did, but you must do so using the exact same encodingthe byte array was originally generated with. Only then you can convert it back to an unicode string (which has no encoding, and doesn't need one).
java 中的字符串已经是 unicode 表示形式。当您对其调用getBytes方法之一时,您将获得特定编码的编码表示(作为字节,因此是二进制值) - 在您的示例中为 ISO-8859-15。如果您想将此字节数组转换回 unicode 字符串,您可以使用接受字节数组的字符串构造函数之一来执行此操作,就像您所做的那样,但您必须使用与最初生成字节数组时完全相同的编码来执行此操作。只有这样,您才能将其转换回 unicode 字符串(没有编码,也不需要编码)。
Beware of the encoding-less methods, both the string constructor and the getBytes method, since they use the default encoding of the platform the code is running on, which might not be what you want to achieve.
请注意无编码方法,包括字符串构造函数和 getBytes 方法,因为它们使用运行代码的平台的默认编码,这可能不是您想要实现的。
回答by Grim
I think the problem here is that you're assuming a java String is encoded with whatever you've specified in the constructor. It's not.It's in UTF-16.
我认为这里的问题是你假设一个 java String 是用你在构造函数中指定的任何东西编码的。 不是。它是 UTF-16。
So, "üü???ABC".getBytes("ISO-8859-15")
is actually converting a UTF-16 string to ISO-8859-15, and then getting the byte representation of that.
因此,"üü???ABC".getBytes("ISO-8859-15")
实际上是将 UTF-16 字符串转换为 ISO-8859-15,然后获取该字符串的字节表示。
If you want to get the human-readable format in your Eclipse console, just keep it as it is (in UTF-16) - and call System.out.println("üü???ABC")
, because your Eclipse console will decode the string and display it as UTF-16.
如果您想在 Eclipse 控制台中获得人类可读的格式,只需保持原样(UTF-16) - 并调用System.out.println("üü???ABC")
,因为您的 Eclipse 控制台将解码字符串并将其显示为 UTF-16。
回答by PbxMan
You are trying to decode a byteArray encoded with "ISO-8859-15" with "UTF-8" format
您正在尝试使用“UTF-8”格式解码使用“ISO-8859-15”编码的 byteArray
b = "üü???ABC".getBytes("ISO-8859-15");
u = "üü???ABC".getBytes("UTF-8");
System.out.println(new String(b, "ISO-8859-15")); // will be ok
System.out.println(new String(b, "UTF-8")); // will look garbled
System.out.println(new String(u,"UTF-8")); // will be ok