Java String HEX 到 String ASCII 带重音

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15749475/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 20:43:56  来源:igfitidea点击:

Java String HEX to String ASCII with accentuation

javautf-8hexascii

提问by rcorbellini

I have the String String hex = "6174656ec3a7c3a36f";and i wanna get the String output = "aten??o"but in my test i only get String output = "aten????o";what i m doing wrong?

我有字符串 String hex = "6174656ec3a7c3a36f";,我想得到,String output = "aten??o"但在我的测试中,我只 String output = "aten????o";知道我做错了什么?

String hex = "6174656ec3a7c3a36f";
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
  String str = hex.substring(i, i+2);
  output.append((char)Integer.parseInt(str, 16));
} 

System.out.println(output); //here is the output "aten????o"

回答by jedwards

Consider

考虑

String hex = "6174656ec3a7c3a36f";                                  // AAA
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2);
for (int i = 0; i < hex.length(); i+=2) {
    buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16));
}
buff.rewind();
Charset cs = Charset.forName("UTF-8");                              // BBB
CharBuffer cb = cs.decode(buff);                                    // BBB
System.out.println(cb.toString());                                  // CCC

Which prints: aten??o

哪个打印: aten??o

Basically, your hex string represents the hexidecimal encoding of the bytesthat represent the charactersin the string aten??o when encoded in UTF-8.

基本上,您的十六进制字符串表示以 UTF-8 编码时表示字符串 aten??o 中字符字节的十六进制编码。

To decode:

解码:

  • You first have to go from your hex string to bytes (AAA)
  • Then go from bytes to chars (BBB) -- this is dependent on the encoding, in your case UTF-8.
  • The go from chars to a string (CCC)
  • 你首先必须从你的十六进制字符串到字节(AAA)
  • 然后从字节到字符(BBB)——这取决于编码,在你的情况下是 UTF-8。
  • 从字符到字符串 (CCC)

回答by Martin Ellis

Your hex string appears to denote a UTF-8 string, rather than ISO-8859-1.

您的十六进制字符串似乎表示 UTF-8 字符串,而不是 ISO-8859-1。

The reason I can say this is that if it was ISO-8859-1, you'd have two hex digits per character. Your hex string has 18 characters, but your expected output is only 7 characters. Hence, the hex string must be a variable width encoding, and not a single byte per character like ISO-8859-1.

我可以这么说的原因是,如果它是 ISO-8859-1,那么每个字符将有两个十六进制数字。您的十六进制字符串有 18 个字符,但您的预期输出只有 7 个字符。因此,十六进制字符串必须是可变宽度编码,而不是像 ISO-8859-1 那样每个字符一个字节。

The following program produces the output: aten??o

以下程序产生输出: aten??o

    String hex = "6174656ec3a7c3a36f";
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    for (int i = 0; i < hex.length(); i += 2) {
      String str = hex.substring(i, i + 2);
      int byteVal = Integer.parseInt(str, 16);
      baos.write(byteVal);
    } 
    String s = new String(baos.toByteArray(), Charset.forName("UTF-8"));

If you change UTF-8to ISO-8859-1, you'll see: aten?§?£o.

如果您更改UTF-8ISO-8859-1,您将看到:aten?§?£o

回答by Aubin

The Java Strings are Unicode: each character is encoded on 16 bits. Your String is - I suppose - a "C" string. You have to know the name of the character encoder and use CharsetDecoder.

爪哇字符串是Unicode的:每个字符在16位编码。你的字符串是 - 我想 - 一个“C”字符串。您必须知道字符编码器的名称并使用CharsetDecoder

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;

public class Char8859_1Decoder {

   public static void main( String[] args ) throws CharacterCodingException {
      String hex = "6174656ec3a7c3a36f";
      int len = hex.length();
      byte[] cStr = new byte[len/2];
      for( int i = 0; i < len; i+=2 ) {
         cStr[i/2] = (byte)Integer.parseInt( hex.substring( i, i+2 ), 16 );
      }
      CharsetDecoder decoder = Charset.forName( "UTF-8" ).newDecoder();
      CharBuffer cb = decoder.decode( ByteBuffer.wrap( cStr ));
      System.out.println( cb.toString());
   }
}

回答by PaulProgrammer

The ? and ? are 16-bit characters, so they are not represented by a byte as you assume in your decode routine, but rather by a full word.

这 ?和 ?是 16 位字符,因此它们不像您在解码例程中假设的那样由字节表示,而是由完整的字表示。

I would, instead of converting each byte to a char, convert the bytes to java Bytes, and then use a string routine to decode the array of Bytes to a string, allowing java the dull task of determining the decoding routine.

我会,而不是将每个字节转换为字符,将字节转换为 java 字节,然后使用字符串例程将字节数组解码为字符串,让 java 确定解码例程的枯燥任务。

Of course, java may guess wrong, so you might have to know ahead of time what the encoding is, as per the answer given by @Aubin or @Martin Ellis

当然,java可能猜错了,所以你可能需要提前知道编码是什么,根据@Aubin或@Martin Ellis给出的答案