java 如何使用jsoup从此html标签中获取文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15946200/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 21:23:20  来源:igfitidea点击:

How to get text from this html tag by using jsoup?

javahtmljsoup

提问by user2269351

I meet a position when i using jsoup to extracting data. The data like this:

当我使用 jsoup 提取数据时,我遇到了一个职位。数据是这样的:

This is a <strong>strong</strong> number <date>2013</date>

I want to get data like this: This is a number

我想得到这样的数据: This is a number

How can I do that? Can anyone help me?

我怎样才能做到这一点?谁能帮我?

回答by ollo

You can parse the html into a Document, select the body-Element and get its text.

您可以将 html 解析为Document,选择body-Element 并获取其文本。

Example:

例子:

Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>");

String ownText = doc.body().ownText();
String text = doc.body().text();

System.out.println(ownText);
System.out.println(text);

Output:

输出:

This is a number  
This is a strong number 2013

回答by Mehdi Karamosly

This should answer your question :

这应该回答你的问题:

public String escapeHtml(String source) {
    Document doc = Jsoup.parseBodyFragment(source);
    Elements elements = doc.select("b");
    for (Element element : elements) {
        element.replaceWith(new TextNode(element.toString(),""));
    }
    return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
}

Jsoup - Howto clean html by escaping not deleting the unwanted html?

Jsoup - 如何通过转义而不删除不需要的 html 来清理 html?

回答by Mehdi Karamosly

Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>");

Spanned HtmlDoc = Html.fromHtml(doc.toString());
String fromHTML = HtmlDoc.toString();

System.out.println(fromHTML);