如何在 Java 中将字符串与 UTF8 字节数组相互转换

Question

提问by mcherm

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?

在 Java 中，我有一个 String，我想将它编码为一个字节数组（UTF8 或其他一些编码）。或者，我有一个字节数组（采用某种已知编码），我想将其转换为 Java 字符串。我如何进行这些转换？

Answer 1

采纳答案by mcherm

Convert from String to byte[]:

从字符串转换为字节[]：

String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);

Convert from byte[] to String:

从字节 [] 转换为字符串：

byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);

You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, the two most common encodings.

当然，您应该使用正确的编码名称。我的示例使用了 US-ASCII 和 UTF-8，这两种最常见的编码。

Answer 2

回答by Jorge Ferreira

String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");

Answer 3

回答by McDowell

You can convert directly via the String(byte[], String)constructor and getBytes(String) method. Java exposes available character sets via the Charsetclass. The JDK documentation lists supported encodings.

您可以直接通过String(byte[], String)构造函数和 getBytes(String) 方法进行转换。Java 通过Charset类公开可用的字符集。JDK 文档列出了支持的编码。

90% of the time, such conversions are performed on streams, so you'd use the Reader/Writerclasses. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.

90% 的情况下，此类转换是在流上执行的，因此您将使用Reader/ Writer类。您不会在任意字节流上使用 String 方法进行增量解码 - 您可能会遇到涉及多字节字符的错误。

Answer 4

回答by savio

terribly late but i just encountered this issue and this is my fix:

太晚了，但我刚刚遇到了这个问题，这是我的解决方法：

private static String removeNonUtf8CompliantCharacters( final String inString ) {
    if (null == inString ) return null;
    byte[] byteArr = inString.getBytes();
    for ( int i=0; i < byteArr.length; i++ ) {
        byte ch= byteArr[i]; 
        // remove any characters outside the valid UTF-8 range as well as all control characters
        // except tabs and new lines
        if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
            byteArr[i]=' ';
        }
    }
    return new String( byteArr );
}

Answer 5

回答by M. Leonhard

Here's a solution that avoids performing the Charset lookup for every conversion:

这是一个避免为每次转换执行 Charset 查找的解决方案：

import java.nio.charset.Charset;

private final Charset UTF8_CHARSET = Charset.forName("UTF-8");

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

byte[] encodeUTF8(String string) {
    return string.getBytes(UTF8_CHARSET);
}

Answer 6

回答by Pacerier

If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.Stringat all. It's much much more performant to simply cast the byte into char:

如果您使用的是 7 位 ASCII 或 ISO-8859-1（一种非常常见的格式），那么您根本不必创建新的java.lang.String。简单地将字节转换为字符的性能要高得多：

Full working example:

完整的工作示例：

for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
    char c = (char) b;
    System.out.print(c);
}

If you are notusing extended-characterslike ?, ?, ?, ?, ?, ê andcan be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).

如果您不使用扩展字符，如 ?, ?, ?, ?, ?, ê并且可以确定唯一传输的值是前 128 个 Unicode 字符，那么此代码也适用于 UTF-8 和扩展 ASCII （如cp-1252）。

Answer 7

回答by Ran Adler

//query is your json   

 DefaultHttpClient httpClient = new DefaultHttpClient();
 HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");

 StringEntity input = new StringEntity(query, "UTF-8");
 input.setContentType("application/json");
 postRequest.setEntity(input);   
 HttpResponse response=response = httpClient.execute(postRequest);

Answer 8

回答by paiego

My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .

我的 tomcat7 实现接受字符串为 ISO-8859-1；尽管 HTTP 请求的内容类型。在尝试正确解释像 'é' 这样的字符时，以下解决方案对我有用。

byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());

String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);

When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.

尝试将字符串解释为 US-ASCII 时，字节信息未正确解释。

b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());

Answer 9

回答by vtor

As an alternative, StringUtilsfrom Apache Commons can be used.

作为替代方案，可以使用来自 Apache Commons 的StringUtils。

 byte[] bytes = {(byte) 1};
 String convertedString = StringUtils.newStringUtf8(bytes);

or

或者

 String myString = "example";
 byte[] convertedBytes = StringUtils.getBytesUtf8(myString);

If you have non-standard charset, you can use getBytesUnchecked()or newString()accordingly.

如果您有非标准字符集，则可以相应地使用getBytesUnchecked()或newString()。

Answer 10

回答by Макс Даниленко

Reader reader = new BufferedReader(
    new InputStreamReader(
        new ByteArrayInputStream(
            string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));

如何在 Java 中将字符串与 UTF8 字节数组相互转换

提问by mcherm

采纳答案by mcherm

回答by Jorge Ferreira

回答by McDowell

回答by savio

回答by M. Leonhard

回答by Pacerier

回答by Ran Adler

回答by paiego

回答by vtor

回答by Макс Даниленко

相关推荐

最近更新

标签

如何在 Java 中将字符串与 UTF8 字节数组相互转换

提问by mcherm

采纳答案by mcherm

回答by Jorge Ferreira

回答by McDowell

回答by savio

回答by M. Leonhard

回答by Pacerier

回答by Ran Adler

回答by paiego

回答by vtor

回答by Макс Даниленко

相关推荐

Java 您如何创建在生产中从 jar 进行测试和运行时可用的 MANIFEST.MF？

如何使用 Java 解码/解密 MD5 加密

Java Hibernate Query By Example 和 Projections

Java 获取 spring bean 的新实例

相关推荐

最近更新

标签