在 Java 中将 char 表示为一个字节
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/699319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Representing char as a byte in Java
提问by jbu
I must convert a char into a byte or a byte array. In other languages I know that a char is just a single byte. However, looking at the Java Character class, its min value is \u0000 and its max value is \uFFFF. This makes it seem like a char is 2 bytes long.
我必须将字符转换为字节或字节数组。在其他语言中,我知道 char 只是一个字节。但是,查看 Java Character 类,其最小值为 \u0000,最大值为 \uFFFF。这使得 char 看起来像是 2 个字节长。
Will I be able to store it as a byte or do I need to store it as two bytes?
我可以将其存储为一个字节还是需要将其存储为两个字节?
Before anyone asks, I will say that I'm trying to do this because I'm working under an interface that expects my results to be a byte array. So I have to convert my char to one.
在有人问之前,我会说我正在尝试这样做,因为我正在一个接口下工作,该接口期望我的结果是一个字节数组。所以我必须将我的字符转换为一。
Please let me know and help me understand this.
请让我知道并帮助我理解这一点。
Thanks, jbu
谢谢,jbu
回答by erickson
To convert characters to bytes, you need to specify a character encoding. Some character encodings use one byte per character, while others use two or more bytes. In fact, for many languages, there are far too many characters to encode with a single byte.
要将字符转换为字节,您需要指定字符编码。一些字符编码每个字符使用一个字节,而其他字符编码使用两个或更多字节。事实上,对于许多语言来说,用一个字节编码的字符太多了。
In Java, the simplest way to convert from characters to bytes is with the String
class's getBytes(Charset)
method. (The StandardCharsets
class defines some common encodings.) However, this method will silently replace characters with � if the character cannot be mapped under the specified encoding. If you need more control, you can configure a CharsetEncoder
to handle this case with an error or use a different replacement character.
在 Java 中,将字符转换为字节的最简单方法是使用String
类的getBytes(Charset)
方法。(StandardCharsets
该类定义了一些常见的编码。)但是,如果无法在指定的编码下映射字符,则此方法将默默地将字符替换为 。如果您需要更多控制,您可以配置 aCharsetEncoder
来处理这种带有错误的情况或使用不同的替换字符。
回答by TofuBeer
char in java is an unsigned 16 bit value. If what you have will fit in 7 bits then just do the cast to a byte (for instance ASCII will fit).
java中的char是一个无符号的16位值。如果您拥有的内容适合 7 位,那么只需将转换为一个字节(例如 ASCII 将适合)。
You could checkout the java.nio.charsetAPIs as well.
您也可以查看java.nio.charsetAPI。
回答by Eddie
To extend what others are saying, if you have a char that you need as a byte array, then you first create a String containing that char and then get the byte array from the String:
为了扩展其他人的说法,如果您有一个需要作为字节数组的字符,那么您首先创建一个包含该字符的字符串,然后从字符串中获取字节数组:
private byte[] charToBytes(final char x) {
String temp = new String(new char[] {x});
try {
return temp.getBytes("ISO-8859-1");
} catch (UnsupportedEncodingException e) {
// Log a complaint
return null;
}
}
Of course, use the appropriate character set. Much more efficient that this would be to start working with Strings rather than take a char at a time, convert to a String, then convert to a byte array.
当然,使用适当的字符集。比开始使用字符串更有效,而不是一次取一个字符,转换为字符串,然后转换为字节数组。
回答by Varkhan
A char is indeed 16 bits in Java (and is also the only unsigned type!!).
在 Java 中,char 确实是 16 位(也是唯一的无符号类型!!)。
If you are sure the encoding of your characters is ASCII, then you can just cast them away on a byte (since ASCII uses only the lower 7 bits of the char).
如果您确定字符的编码是 ASCII,那么您可以将它们丢弃在一个字节上(因为 ASCII 仅使用字符的低 7 位)。
If you do not need to modify the characters, or understand their signification within a String, you can just store chars on two bytes, like:
如果您不需要修改字符或了解它们在字符串中的含义,则可以将字符存储在两个字节中,例如:
char[] c = ...;
byte[] b = new byte[c.length*2];
for(int i=0; i<c.length; i++) {
b[2*i] = (byte) (c[i]&0xFF00)>>8;
b[2*i+1] = (byte) (c[i]&0x00FF);
}
(It may be advisable to replace the 2* by a right shift, if speed matters).
(如果速度很重要,建议用右移代替 2*)。
Note however that some actual (displayed) characters (or, more accurately, Unicode code-points) are written on two consecutive chars. So cutting between two chars does not ensure that you are cutting between actual characters.
但是请注意,一些实际(显示)字符(或更准确地说,Unicode 代码点)是写在两个连续字符上的。因此,在两个字符之间进行切割并不能确保您在实际字符之间进行切割。
If you need to decode/encode or otherwise manipulate your char array in a String-aware manner, you should rather try to decode and encode your char array or String using the java.io tools, that ensure proper character manipulation.
如果您需要以字符串感知方式解码/编码或以其他方式操作您的字符数组,您应该尝试使用 java.io 工具解码和编码您的字符数组或字符串,以确保正确的字符操作。