如何在 Java 中在 ISO-8859-1 和 UTF-8 之间进行转换？

Question

提问by

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java?

有谁知道如何将字符串从 ISO-8859-1 转换为 UTF-8 并返回 Java？

I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this?

我从 Web 获取字符串并将其保存在 RMS (J2ME) 中，但我想保留特殊字符并从 RMS 获取字符串，但使用 ISO-8859-1 编码。我该怎么做呢？

Answer 1

采纳答案by erickson

In general, you can't do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can handle only a tiny fraction of them. So, transcoding from ISO-8859-1 to UTF-8 is no problem. Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found.

一般来说，你不能这样做。UTF-8 能够编码任何 Unicode 代码点。ISO-8859-1 只能处理其中的一小部分。所以，从 ISO-8859-1 转码到 UTF-8 是没有问题的。当发现不支持的字符时，从 UTF-8 倒退到 ISO-8859-1 将导致“替换字符”( ) 出现在您的文本中。

To transcode text:

要转码文本：

byte[] latin1 = ...
byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");

or

或者

byte[] utf8 = ...
byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");

You can exercise more control by using the lower-level CharsetAPIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.

您可以使用较低级别的CharsetAPI进行更多控制。例如，您可以在找到不可编码的字符时引发异常，或者使用不同的字符替换文本。

Answer 2

回答by Johannes Weiss

If you have a String, you can do that:

如果你有一个String，你可以这样做：

String s = "test";
try {
    s.getBytes("UTF-8");
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

If you have a 'broken' String, you did something wrong, converting a Stringto a Stringin another encoding is defenetely not the way to go! You can convert a Stringto a byte[]and vice-versa (given an encoding). In Java Strings are AFAIK encoded with UTF-16but that's an implementation detail.

如果你有一个“破” String，你做错了什么，转换String到String另一种编码defenetely不是要走的路！您可以将 a 转换String为 a byte[]，反之亦然（给定编码）。在 JavaString中，使用 AFAIK 编码，UTF-16但这是一个实现细节。

Say you have a InputStream, you can read in a byte[]and then convert that to a Stringusing

假设您有 a InputStream，您可以读入 abyte[]然后将其转换为Stringusing

byte[] bs = ...;
String s;
try {
    s = new String(bs, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

or even better (thanks to erickson) use InputStreamReaderlike that:

甚至更好（感谢 erickson）这样使用InputStreamReader：

InputStreamReader isr;
try {
     isr = new InputStreamReader(inputStream, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

Answer 3

回答by JLeon90

Here is an easy way with String output (I created a method to do this):

这是字符串输出的简单方法（我创建了一个方法来执行此操作）：

public static String (String input){
    String output = "";
    try {
        /* From ISO-8859-1 to UTF-8 */
        output = new String(input.getBytes("ISO-8859-1"), "UTF-8");
        /* From UTF-8 to ISO-8859-1 */
        output = new String(input.getBytes("UTF-8"), "ISO-8859-1");
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return output;
}
// Example
input = "Música";
output = "M?osica";

Answer 4

回答by Bahadir Tasdemir

Which worked for me:("üzüm ba?lar?" is the correct written in Turkish)

哪个对我有用：（“üzüm ba?lar？”是用土耳其语写的正确的）

Convert ISO-8859-1 to UTF-8:

将 ISO-8859-1 转换为 UTF-8：

String encodedWithISO88591 = "??z??m ba?lar?±";
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");
//Result, decodedToUTF8 --> "üzüm ba?lar?"

Convert UTF-8 to ISO-8859-1

将 UTF-8 转换为 ISO-8859-1

String encodedWithUTF8 = "üzüm ba?lar?";
String decodedToISO88591 = new String(encodedWithUTF8.getBytes("UTF-8"), "ISO-8859-1");
//Result, decodedToISO88591 --> "??z??m ba?lar?±"

Answer 5

回答by Alberto Segura

Apache Commons IO Charsets classcan come in handy:

Apache Commons IO Charsets 类可以派上用场：

String utf8String = new String(org.apache.commons.io.Charsets.ISO_8859_1.encode(latinString).array())

Answer 6

回答by che.moor

Here is a function to convert UNICODE (ISO_8859_1) to UTF-8

这是一个将 UNICODE (ISO_8859_1) 转换为 UTF-8 的函数

public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) {
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < strISO_8859_1.length(); i++) {
  final char ch = strISO_8859_1.charAt(i);
  if (ch <= 127) 
  {
      stringBuilder.append(ch);
  }
  else 
  {
      stringBuilder.append(String.format("%02x", (int)ch));
  }
}
String s = stringBuilder.toString();
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
    data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                         + Character.digit(s.charAt(i+1), 16));
}
String strUTF_8 =new String(data, StandardCharsets.UTF_8);
return strUTF_8;
}

TEST

测试

String strA_ISO_8859_1_i = new String("??????".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);

System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));

RESULT

结果

ISO_8859_1 strA est = ?§ù?où?§ù String_ISO_8859_1To_UTF_8 = ??????

Answer 7

回答by Pritam Banerjee

Regex can also be good and be used effectively (Replaces all UTF-8 characters not covered in ISO-8859-1with space):

正则表达式也可以很好并且可以有效使用（替换所有未ISO-8859-1用空格覆盖的 UTF-8 字符）：

String input = "Tes?ti?ng [§] all of i?t _ - à ?? with some 9umbers as"
            + " w2921**#$%!@# well ü, or ü, is a cha?racte?";
String output = input.replaceAll("[^\u0020-\u007e\u00a0-\u00ff]", " ");
System.out.println("Input = " + input);
System.out.println("Output = " + output);

如何在 Java 中在 ISO-8859-1 和 UTF-8 之间进行转换？

提问by

采纳答案by erickson

回答by Johannes Weiss

回答by JLeon90

回答by Bahadir Tasdemir

回答by Alberto Segura

回答by che.moor

回答by Pritam Banerjee

相关推荐

最近更新

标签

如何在 Java 中在 ISO-8859-1 和 UTF-8 之间进行转换？

提问by

采纳答案by erickson

回答by Johannes Weiss

回答by JLeon90

回答by Bahadir Tasdemir

回答by Alberto Segura

回答by che.moor

回答by Pritam Banerjee

相关推荐

java.rmi.NoSuchObjectException: 表中没有这样的对象

Java 我的应用程序中不支持 Major.minor 版本 52.0

Java 使用 Groovy 解压缩存档

如何使用 java apache poi 从 xlsx 文件的特定单元格中获取值

相关推荐

最近更新

标签