java 将包含 ASCII 的字符串转换为 Unicode

Question

提问by Rob Hufschmitt

I get a string from my HTML page into my Java HTTPServlet. On my request I get ASCII codes that display Chinese characters:

我从我的 HTML 页面获取一个字符串到我的 Java HTTPServlet 中。根据我的要求，我得到了显示汉字的 ASCII 代码：

"& #21487;& #20197;& #21578;& #35785;& #25105;" (without the spaces)

“可以告诉我” （没有空格）

How can I transform this string into Unicode?

如何将此字符串转换为 Unicode？

HTML code:

HTML代码：

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Find information</title>
    <link rel="stylesheet" type="text/css" href="layout.css">
</head>
<body>

<form id="lookupform" name="lookupform" action="LookupServlet" method="post" accept-charset="UTF-8">
    <table id="lookuptable" align="center">
        <tr>
            <label>Question:</label>
            <td><textarea cols="30" rows="2" name="lookupstring" id="lookupstring"></textarea></td>
        </tr>
    </table>
    <input type="submit" name="Look up" id="lookup" value="Look up"/>
</form>

Java code:

爪哇代码：

request.setCharacterEncoding("UTF-8");
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr = (LoginResult) session.getAttribute("loginResult");
String[] question = request.getParameterValues("lookupstring");

If I print question[0] then I get this value: "& #21487;& #20197;& #21578;& #35785;& #25105;"

如果我打印 question[0] 然后我得到这个值：“& #21487;& #20197;& #21578;& #35785;& #25105;”

Answer 1

回答by Pablo Santa Cruz

There is no such thing as ASCIIcodes that display Chinese characters. ASCII does not represent Chinese characters.

没有ASCII显示汉字的代码。ASCII 不代表汉字。

If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). You can then encodethat Java string into Unicode using UTF-8or UTF-16representations:

如果您已经有一个 Java 字符串，那么它已经具有所有字符（美国、拉丁语、中文）的内部表示。然后，您可以使用UTF-8或UTF-16表示将该 Java 字符串编码为 Unicode ：

~~String s = "可以告诉我";~~(EDIT: This line won't display correctly on systems not having fonts for Chinese characters)

~~String s = "可以告诉我";~~（编辑：此行在没有汉字字体的系统上无法正确显示）

String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
byte utfString = s.getBytes("UTF-8");

Now that I look at your updated question, you might be looking for the StringEscapeUtilsclass. It's from Apache Commons Text. And will unescapeyour HTML entities into a Java string:

现在我查看了您更新的问题，您可能正在寻找StringEscapeUtils类。它来自 Apache Commons Text。并将您的 HTML 实体转义为 Java 字符串：

String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces

Answer 2

回答by Thorbj?rn Ravn Andersen

A Java String contains unicode characters. The decoding has taken place when the string was constructed.

Java 字符串包含 unicode 字符。构造字符串时已进行解码。

java 将包含 ASCII 的字符串转换为 Unicode

提问by Rob Hufschmitt

回答by Pablo Santa Cruz

回答by Thorbj?rn Ravn Andersen

相关推荐

最近更新

标签

java 将包含 ASCII 的字符串转换为 Unicode

提问by Rob Hufschmitt

回答by Pablo Santa Cruz

回答by Thorbj?rn Ravn Andersen

相关推荐

java 将参数传递给struts2中的url？

如何加快 Java / Android 中的解压缩时间？

java 如何在java中从一组大小为n的集合中迭代生成k个元素子集？

java == 运算符和 equals() 有什么区别？（使用哈希码（）？？？）

相关推荐

最近更新

标签