java 字符串到二进制,反之亦然:扩展 ASCII

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5535988/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 11:35:17  来源:igfitidea点击:

String to binary and vice versa: extended ASCII

javastringbinarybyte

提问by anonymous001

I want to convert a String to binary by putting it in a byte array (String.getBytes[]) and then store the binary string for each byte (Integer.toBinaryString(bytearray)) in a String[]. Then I want to convert back to normal String via Byte.parseByte(stringarray[i], 2). This works great for standard ASCII-Table, but not for the extended one. For example, an Agives me 1000001, but an ?returns

我想通过将字符串放入字节数组 ( String.getBytes[])中将其转换为二进制,然后将每个字节的二进制字符串 ( Integer.toBinaryString(bytearray)) 存储在 String[] 中。然后我想通过Byte.parseByte(stringarray[i], 2). 这对标准 ASCII 表非常有效,但不适用于扩展表。例如, anA给了我1000001,但?返回

11111111111111111111111111000011
11111111111111111111111110000100

Any ideas how to manage this?

任何想法如何管理这个?

public class BinString {
    public static void main(String args[]) {
        String s = "?";
        System.out.println(binToString(stringToBin(s)));

    }

    public static String[] stringToBin(String s) {
        System.out.println("Converting: " + s);
        byte[] b = s.getBytes();
        String[] sa = new String[s.getBytes().length];
        for (int i = 0; i < b.length; i++) {
            sa[i] = Integer.toBinaryString(b[i] & 0xFF);
        }
        return sa;
    }

    public static String binToString(String[] strar) {
        byte[] bar = new byte[strar.length];
        for (int i = 0; i < strar.length; i++) {
            bar[i] = Byte.parseByte(strar[i], 2);
            System.out.println(Byte.parseByte(strar[i], 2));

        }
        String s = new String(bar);
        return s;
    }

}

回答by Joachim Sauer

First off: "extended ASCII" is a very misleading title that's used to refer to a ton of different encodings.

首先:“extended ASCII”是一个非常具有误导性的标题,用于指代大量不同的编码。

Second: bytein Java is signed, while bytes in encodings are usually handled as unsigned. Since you use Integer.toBinaryString()the bytewill be converted to an intusing sign extension (because byte values > 127 will be represented by negative values in Java).

第二:byte在 Java 中是有符号的,而编码中的字节通常被处理为无符号。由于您使用Integer.toBinaryString()byte将被转换为一个int用符号扩展(因为字节值> 127将由负值在Java中表示)。

To avoid this simply use & 0xFFto mask all but the lower 8 bit like this:

为了避免这种情况,只需& 0xFF像这样屏蔽除低 8 位之外的所有位:

String binary = Integer.toBinaryString(byteArray[i] & 0xFF);

回答by McDowell

To expand on Joachim's pointabout "extended ASCII" I'd add...

为了扩展约阿希姆关于“扩展 ASCII”的观点,我要添加......

Note that getBytes()is a transcoding operation that converts data from UTF-16 to the platform default encoding. The encoding varies from system to system and sometimes even between users on the same PC. This means that results are not consistent on all platforms and if a legacy encoding is the default (as it is on Windows) that data can be lost.

请注意,这getBytes()是将数据从 UTF-16 转换为平台默认编码的转码操作。编码因系统而异,有时甚至在同一台 PC 上的用户之间也不同。这意味着结果在所有平台上都不一致,如果旧编码是默认值(就像在 Windows 上一样),则数据可能会丢失。

To make the operation symmetrical, you need to provide an encoding explicitly(preferably a Unicode encoding such as UTF-8 or UTF-16.)

要使操作对称,您需要明确提供编码(最好是 Unicode 编码,例如 UTF-8 或 UTF-16。)

Charset encoding = Charset.forName("UTF-16");
byte[] b = s1.getBytes(encoding);
String s2 = new String(b, encoding);
assert s1.equals(s2);