java Java字节到字符的转换

Question

提问by AndyAndroid

I read from a TCP/IP socket s:

我从 TCP/IP 套接字读取：

byte[] bbuf = new byte[30];
s.getInputStream().read(bbuf);
for (int i = 0; i < bbuf.length; i++)
{
     System.out.println(Integer.toHexString( (int) (bbuf[i] & 0xff)));
}

This outputs CA 68 9F 75 which is what I would expect. Now I want to use chars instead

这输出 CA 68 9F 75 这是我所期望的。现在我想改用字符

char[] cbuf = new char[30];
BufferedReader input =  new BufferedReader(new InputStreamReader(s.getInputStream())); 
for (int i = 0; i < cbuf.length; i++)
{
     System.out.println(Integer.toHexString( (int) (cbuf[i] )));
}

Now the output is CA 68 178 75. So the third Byte (and only the third byte) makes the difference. I assume it has to do with the character sets and that I have to specify a character set in the InputStreamer. I have no idea how to find out what character set I have to use. Secondly I am surprised if it is due to character sets that I only get the mess with exactly one character. I tried all kind of other characters but that seems to be the only one I was able to find.

现在输出是 CA 68 178 75。所以第三个字节（只有第三个字节）有所不同。我假设它与字符集有关，并且我必须在 InputStreamer 中指定一个字符集。我不知道如何找出我必须使用的字符集。其次，我很惊讶，如果是由于字符集，我只得到一个字符的混乱。我尝试了所有其他类型的字符，但这似乎是我唯一能找到的字符。

Who can solve the mystery?

谁能解开谜底？

Answer 1

回答by morgano

Your problem is that you are comparing pears with apples; bytes are not the same as characters. In your code, the character ?is represented in the following ways:

你的问题是你在比较梨和苹果；字节与字符不同。在您的代码中，字符? 以下列方式表示：

9F (byteencoded using Windows-1252)
178 (charencoded using UTF-16, which is the encoding Java always uses for chars internally)

9F（使用 Windows-1252 编码的字节）
178（炭使用UTF-16，它是编码的Java总是为字符采用内部编码）

As a proof of what I'm saying, check this:

作为我所说的证据，请检查以下内容：

String myString = "Ca?a";
byte[] bbuf = myString.getBytes();     // [ 43, 61, C3, B1, 61 ]   (UTF-8 on my machine)
char[] cbuf = myString.toCharArray();  // [ 43, 61, F1, 61 ]  (Java uses UTF-16 internally)

Now an analysis of your problem:

现在分析你的问题：

You took a byte array from a String, I guess by doing this: myString.getBytes()as you didn't specify an encoding, the system is using the default in your machine (Windows-1252)
When you read your bytes into a String using InputSteanReader, etc. there is actually not a problem because you are reading from another (or the same) Windows machine, the problem is when you get the array of chars (instead of an array of bytes) expecting to have the same result (use myString.getBytes()instead of myString.toCharArray()and you'll see your bytes correctly).

您从字符串中获取了一个字节数组，我猜是这样做的：myString.getBytes()由于您没有指定编码，系统正在使用您机器中的默认值 (Windows-1252)
当您使用 InputSteanReader 等将字节读入字符串时，实际上没有问题，因为您是从另一台（或同一台）Windows 机器读取的，问题是当您获取字符数组（而不是字节数组）时) 期望有相同的结果（使用myString.getBytes()而不是myString.toCharArray()你会正确地看到你的字节）。

Finally, some advice:

最后，一些建议：

Always declare explictly the encoding when you convert between Strings and byte arrays:

byte[] bbuf = myString.getBytes(Charset.forName("UTF-8"));

String myString = new String(bbuf, Charset.forName("UTF-8"));

Never mix chars and bytes, they are not the same

在字符串和字节数组之间进行转换时，始终明确声明编码：

byte[] bbuf = myString.getBytes(Charset.forName("UTF-8"));

String myString = new String(bbuf, Charset.forName("UTF-8"));

永远不要混合字符和字节，它们是不一样的

Answer 2

回答by Jesper

InputStreamReaderis going to convert the bytes from the input stream to characters using a character encoding. Since you didn't specify explicitly what character encoding should be used, it's going to use the default character encoding of your system.

InputStreamReader将使用字符编码将输入流中的字节转换为字符。由于您没有明确指定应该使用什么字符编码，它将使用您系统的默认字符编码。

How the bytes are converted depends on what character encoding is being used.

字节的转换方式取决于所使用的字符编码。

If the data is binary data and does not represent text encoded with some character encoding, then using InputStreamReaderis the wrong way to read this data.

如果数据是二进制数据并且不代表用某种字符编码编码的文本，那么使用InputStreamReader读取该数据的方式是错误的。

java Java字节到字符的转换

提问by AndyAndroid

回答by morgano

回答by Jesper

相关推荐

最近更新

标签

java Java字节到字符的转换

提问by AndyAndroid

回答by morgano

回答by Jesper

相关推荐

java Dagger 2 - 两个提供相同接口的方法

Java 使用 UTF-8 或 UTF-16 哪种编码？

使用 Java API 创建一个简单的 1 行 Spark DataFrame

java 从 Spring REST 控制器返回流

相关推荐

最近更新

标签