java Java字节到字符的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40085627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Byte to Char conversion
提问by AndyAndroid
I read from a TCP/IP socket s:
我从 TCP/IP 套接字读取:
byte[] bbuf = new byte[30];
s.getInputStream().read(bbuf);
for (int i = 0; i < bbuf.length; i++)
{
System.out.println(Integer.toHexString( (int) (bbuf[i] & 0xff)));
}
This outputs CA 68 9F 75 which is what I would expect. Now I want to use chars instead
这输出 CA 68 9F 75 这是我所期望的。现在我想改用字符
char[] cbuf = new char[30];
BufferedReader input = new BufferedReader(new InputStreamReader(s.getInputStream()));
for (int i = 0; i < cbuf.length; i++)
{
System.out.println(Integer.toHexString( (int) (cbuf[i] )));
}
Now the output is CA 68 178 75. So the third Byte (and only the third byte) makes the difference. I assume it has to do with the character sets and that I have to specify a character set in the InputStreamer. I have no idea how to find out what character set I have to use. Secondly I am surprised if it is due to character sets that I only get the mess with exactly one character. I tried all kind of other characters but that seems to be the only one I was able to find.
现在输出是 CA 68 178 75。所以第三个字节(只有第三个字节)有所不同。我假设它与字符集有关,并且我必须在 InputStreamer 中指定一个字符集。我不知道如何找出我必须使用的字符集。其次,我很惊讶,如果是由于字符集,我只得到一个字符的混乱。我尝试了所有其他类型的字符,但这似乎是我唯一能找到的字符。
Who can solve the mystery?
谁能解开谜底?
回答by morgano
Your problem is that you are comparing pears with apples; bytes are not the same as characters. In your code, the character ?is represented in the following ways:
你的问题是你在比较梨和苹果;字节与字符不同。在您的代码中,字符? 以下列方式表示:
- 9F (byteencoded using Windows-1252)
- 178 (charencoded using UTF-16, which is the encoding Java always uses for chars internally)
- 9F(使用 Windows-1252 编码的字节)
- 178(炭使用UTF-16,它是编码的Java总是为字符采用内部编码)
As a proof of what I'm saying, check this:
作为我所说的证据,请检查以下内容:
String myString = "Ca?a";
byte[] bbuf = myString.getBytes(); // [ 43, 61, C3, B1, 61 ] (UTF-8 on my machine)
char[] cbuf = myString.toCharArray(); // [ 43, 61, F1, 61 ] (Java uses UTF-16 internally)
Now an analysis of your problem:
现在分析你的问题:
You took a byte array from a String, I guess by doing this:
myString.getBytes()
as you didn't specify an encoding, the system is using the default in your machine (Windows-1252)When you read your bytes into a String using InputSteanReader, etc. there is actually not a problem because you are reading from another (or the same) Windows machine, the problem is when you get the array of chars (instead of an array of bytes) expecting to have the same result (use
myString.getBytes()
instead ofmyString.toCharArray()
and you'll see your bytes correctly).
您从字符串中获取了一个字节数组,我猜是这样做的:
myString.getBytes()
由于您没有指定编码,系统正在使用您机器中的默认值 (Windows-1252)当您使用 InputSteanReader 等将字节读入字符串时,实际上没有问题,因为您是从另一台(或同一台)Windows 机器读取的,问题是当您获取字符数组(而不是字节数组)时) 期望有相同的结果(使用
myString.getBytes()
而不是myString.toCharArray()
你会正确地看到你的字节)。
Finally, some advice:
最后,一些建议:
Always declare explictly the encoding when you convert between Strings and byte arrays:
byte[] bbuf = myString.getBytes(Charset.forName("UTF-8")); String myString = new String(bbuf, Charset.forName("UTF-8"));
Never mix chars and bytes, they are not the same
在字符串和字节数组之间进行转换时,始终明确声明编码:
byte[] bbuf = myString.getBytes(Charset.forName("UTF-8")); String myString = new String(bbuf, Charset.forName("UTF-8"));
永远不要混合字符和字节,它们是不一样的
回答by Jesper
InputStreamReader
is going to convert the bytes from the input stream to characters using a character encoding. Since you didn't specify explicitly what character encoding should be used, it's going to use the default character encoding of your system.
InputStreamReader
将使用字符编码将输入流中的字节转换为字符。由于您没有明确指定应该使用什么字符编码,它将使用您系统的默认字符编码。
How the bytes are converted depends on what character encoding is being used.
字节的转换方式取决于所使用的字符编码。
If the data is binary data and does not represent text encoded with some character encoding, then using InputStreamReader
is the wrong way to read this data.
如果数据是二进制数据并且不代表用某种字符编码编码的文本,那么使用InputStreamReader
读取该数据的方式是错误的。
See also: Streams and readers/writers
另见:流和读者/作者