java 将包含 ASCII 的字符串转换为 Unicode

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4526192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 06:45:20  来源:igfitidea点击:

Convert a string containing ASCII to Unicode

javaunicodeutf-8servlets

提问by Rob Hufschmitt

I get a string from my HTML page into my Java HTTPServlet. On my request I get ASCII codes that display Chinese characters:

我从我的 HTML 页面获取一个字符串到我的 Java HTTPServlet 中。根据我的要求,我得到了显示汉字的 ASCII 代码:

"& #21487;& #20197;& #21578;& #35785;& #25105;" (without the spaces)

“可以告诉我” (没有空格)

How can I transform this string into Unicode?

如何将此字符串转换为 Unicode?

HTML code:

HTML代码:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Find information</title>
    <link rel="stylesheet" type="text/css" href="layout.css">
</head>
<body>

<form id="lookupform" name="lookupform" action="LookupServlet" method="post" accept-charset="UTF-8">
    <table id="lookuptable" align="center">
        <tr>
            <label>Question:</label>
            <td><textarea cols="30" rows="2" name="lookupstring" id="lookupstring"></textarea></td>
        </tr>
    </table>
    <input type="submit" name="Look up" id="lookup" value="Look up"/>
</form>

Java code:

爪哇代码:

request.setCharacterEncoding("UTF-8");
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr = (LoginResult) session.getAttribute("loginResult");
String[] question = request.getParameterValues("lookupstring");

If I print question[0] then I get this value: "& #21487;& #20197;& #21578;& #35785;& #25105;"

如果我打印 question[0] 然后我得到这个值:“& #21487;& #20197;& #21578;& #35785;& #25105;”

回答by Pablo Santa Cruz

There is no such thing as ASCIIcodes that display Chinese characters. ASCII does not represent Chinese characters.

没有ASCII显示汉字的代码。ASCII 不代表汉字。

If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). You can then encodethat Java string into Unicode using UTF-8or UTF-16representations:

如果您已经有一个 Java 字符串,那么它已经具有所有字符(美国、拉丁语、中文)的内部表示。然后,您可以使用UTF-8UTF-16表示将该 Java 字符串编码为 Unicode :

String s = "可以告诉我";(EDIT: This line won't display correctly on systems not having fonts for Chinese characters)

String s = "可以告诉我";编辑此行在没有汉字字体的系统上无法正确显示

String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
byte utfString = s.getBytes("UTF-8");

Now that I look at your updated question, you might be looking for the StringEscapeUtilsclass. It's from Apache Commons Text. And will unescapeyour HTML entities into a Java string:

现在我查看了您更新的问题,您可能正在寻找StringEscapeUtils类。它来自 Apache Commons Text。并将您的 HTML 实体转义为 Java 字符串:

String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces

回答by Thorbj?rn Ravn Andersen

A Java String contains unicode characters. The decoding has taken place when the string was constructed.

Java 字符串包含 unicode 字符。构造字符串时已进行解码。