使用 Java 标准库将 HTML 字符转换回文本

Question

提问by Cheok Yan Cheng

I would like to convert some HTML characters back to text using Java Standard Library. I was wondering whether any library would achieve my purpose?

我想使用 Java 标准库将一些 HTML 字符转换回文本。我想知道是否有任何图书馆可以达到我的目的？

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here

    // "Happy & Sad" in HTML form.
    String s = "Happy &amp; Sad";
    System.out.println(s);

    try {
        // Change to "Happy & Sad". DOESN'T WORK!
        s = java.net.URLDecoder.decode(s, "UTF-8");
        System.out.println(s);
    } catch (UnsupportedEncodingException ex) {

    }
}

Answer 1

采纳答案by Bill.D

I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3()and unescapeHtml4()methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.

我认为 Apache Commons Lang 库StringEscapeUtils.unescapeHtml3()和unescapeHtml4()方法正是您要寻找的。请参阅https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html。

Answer 2

回答by rogeriopvl

I'm not aware of any way to do it using the standard library. But I do know and use this class that deals with html entities.

我不知道有什么方法可以使用标准库来做到这一点。但我知道并使用这个处理 html 实体的类。

"HTMLEntities is an Open Source Java class that contains a collection of static methods (htmlentities, unhtmlentities, ...) to convert special and extended characters into HTML entitities and vice versa."

“HTMLEntities 是一个开源 Java 类，它包含一组静态方法（htmlentities、unhtmlentities 等），用于将特殊字符和扩展字符转换为 HTML 实体，反之亦然。”

http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities

Answer 3

回答by Zach Scrivena

java.net.URLDecoderdeals only with the application/x-www-form-urlencodedMIME format (e.g. "%20" represents space), not with HTML character entities. I don't think there's anything on the Java platform for that. You could write your own utility class to do the conversion, like this one.

java.net.URLDecoder只处理application/x-www-form-urlencodedMIME 格式（例如“%20”代表空格），而不处理HTML 字符实体。我认为 Java 平台上没有任何内容。您可以编写自己的实用程序类来进行转换，就像这样。

Answer 4

回答by Rich

The URL decoder should only be used for decoding strings from the urls generated by html forms which are in the "application/x-www-form-urlencoded" mime type. This does not support html characters.

URL 解码器应该只用于从“application/x-www-form-urlencoded”mime 类型的 html 表单生成的 url 中解码字符串。这不支持 html 字符。

After a searchI found a Translateclass within the HTML Parserlibrary.

一个经过搜索，我发现一个翻译的类内的HTML解析器库。

Answer 5

回答by jem

Here you have to just add jar file in lib jsoup in your application and then use this code.

在这里，您只需在应用程序的 lib jsoup 中添加 jar 文件，然后使用此代码。

import org.jsoup.Jsoup;

public class Encoder {
    public static void main(String args[]) {
        String s = Jsoup.parse("&lt;Fran&ccedil;ais&gt;").text();
        System.out.print(s);
    }
}

Link to download jsoup: http://jsoup.org/download

jsoup下载链接：http: //jsoup.org/download

Answer 6

回答by Daniele

As @jem suggested, it is possible to use jsoup.

正如@jem 建议的那样，可以使用 jsoup。

With jSoup 1.8.3 it il possible to use the method Parser.unescapeEntitiesthat retain the original html.

使用 jSoup 1.8.3，可以使用保留原始 html 的Parser.unescapeEntities方法。

import org.jsoup.parser.Parser;
...
String html = Parser.unescapeEntities(original_html, false);

It seems that in some previous release this method is not present.

似乎在某些以前的版本中不存在此方法。

Answer 7

回答by Bruno Barros

You can use the class org.apache.commons.lang.StringEscapeUtils:

您可以使用类 org.apache.commons.lang.StringEscapeUtils：

String s = StringEscapeUtils.unescapeHtml("Happy &amp; Sad")

It is working.

这是工作。

Answer 8

回答by Heriberto Gutiérrez Gutiérrez

Or you can use unescapeHtml4:

或者你可以使用 unescapeHtml4：

    String miCadena="GU&#205;A TELEF&#211;NICA";
    System.out.println(StringEscapeUtils.unescapeHtml4(miCadena));

This code print the line: GUíA TELEFóNICA

此代码打印以下行：GUíA TELEFóNICA

使用 Java 标准库将 HTML 字符转换回文本

提问by Cheok Yan Cheng

采纳答案by Bill.D

回答by rogeriopvl

回答by Zach Scrivena

回答by Rich

回答by jem

回答by Daniele

回答by Bruno Barros

回答by Heriberto Gutiérrez Gutiérrez

相关推荐

最近更新

标签

使用 Java 标准库将 HTML 字符转换回文本

提问by Cheok Yan Cheng

采纳答案by Bill.D

回答by rogeriopvl

回答by Zach Scrivena

回答by Rich

回答by jem

回答by Daniele

回答by Bruno Barros

回答by Heriberto Gutiérrez Gutiérrez

相关推荐

异常 java.lang.NoClassDefFoundError: org/aspectj/lang/annotation/Aspect

Java Lombok 未在 maven 中编译

Java 手动生成人脸上下文

Java 将字符串列表的所有元素转换为大写

相关推荐

最近更新

标签