在 Java 中将 ANSI 字符转换为 UTF-8

Question

提问by n002213f

Is there a way to convert an ANSI string to UTF using Java.

有没有办法使用 Java 将 ANSI 字符串转换为 UTF。

I have a custom serializer that uses readUTF & writeUTF methods of the DataInputStream class to deserialize and serialze string. If i receive a string encoded in ANSI and is too long, ~100000 chars long i get the error;

我有一个自定义序列化程序，它使用 DataInputStream 类的 readUTF 和 writeUTF 方法来反序列化和序列化字符串。如果我收到一个以 ANSI 编码的字符串并且太长，大约 100000 个字符长，我会收到错误消息；

Caused by: java.io.UTFDataFormatException: encoded string too long: 106958 bytes

引起：java.io.UTFDataFormatException：编码的字符串太长：106958 字节

However in my Junit tests i'm able create a string with 120000 'a's and it works perfectly

但是，在我的 Junit 测试中，我能够创建一个包含 120000 'a's 的字符串，并且它运行良好

I have checked the following posts but still having errors;

我已经检查了以下帖子，但仍有错误；

Answer 1

采纳答案by ZZ Coder

This error is not caused by character encoding. It means the length of the UTF data is wrong.

此错误不是由字符编码引起的。这意味着UTF数据的长度是错误的。

EDIT: Just realized this is a writing error, not reading error.

编辑：刚刚意识到这是一个写入错误，而不是读取错误。

The UTF length is only 2 bytes so it can only hold 64K UTF-8 bytes. You are trying to writing 100K, it's not going to work.

UTF 长度只有 2 个字节，因此它只能容纳 64K UTF-8 字节。您正在尝试写入 100K，这是行不通的。

This limit is hardcoded and no way to get around this,

这个限制是硬编码的，没有办法绕过这个，

if (utflen > 65535)
    throw new UTFDataFormatException(
            "encoded string too long: " + utflen + " bytes");

Answer 2

回答by iammichael

byte[] asciiBytes = ...;
String unicode = new String(asciiBytes, "US-ASCII");
byte[] utfBytes = unicode.getBytes("UTF-8");

Answer 3

回答by Aaron Digulla

Which ANSI codepage? There are lots of different character encodings which all refer to "ANSI". The DOS codepage is 437 (without the drawing symbols). If you use codepage 850, this will work:

哪个ANSI 代码页？有许多不同的字符编码都指的是“ANSI”。DOS 代码页是 437（没有绘图符号）。如果您使用代码页 850，这将起作用：

String unicode = new String(bytes, "IBM850");

(where bytesis an array with the ANSI characters). After that, you can convert this string into a byte array with any encoding using unicode.getBytes(encoding).

（其中bytes是带有 ANSI 字符的数组）。之后，您可以将此字符串转换为使用任何编码的字节数组unicode.getBytes(encoding)。

Windows often uses the codepage 1252 (use "windows-1252" for that).

Windows 通常使用代码页 1252（为此使用“windows-1252”）。

Answer 4

回答by István

ZZ Coder already answered the question, but I have written a more detailed explanation and suggesting a workaround on this blog. Basically, the problem is in DataOutputStream, because it restricts the writeable String to 64KB. There are other possible workarounds to bystep the issue, some might work without breaking the actual binary data format one is using...

ZZ Coder 已经回答了这个问题，但我在这个博客上写了更详细的解释和建议的解决方法。基本上，问题出在 DataOutputStream 中，因为它将可写字符串限制为 64KB。还有其他可能的解决方法来解决这个问题，有些可能会在不破坏实际使用的二进制数据格式的情况下工作......

在 Java 中将 ANSI 字符转换为 UTF-8

提问by n002213f

采纳答案by ZZ Coder

回答by iammichael

回答by Aaron Digulla

回答by István

相关推荐

最近更新

标签

在 Java 中将 ANSI 字符转换为 UTF-8

提问by n002213f

采纳答案by ZZ Coder

回答by iammichael

回答by Aaron Digulla

回答by István

相关推荐

Java 项目单击侦听器上的 GridView

Java 在哪里可以找到旧版本的 JDK 和 JRE？

Java HttpURLConnection - “https://”与“http://”

Java ArrayList contains 方法如何工作？

相关推荐

最近更新

标签