java 如何使用 Jsoup 提取单独的文本节点?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7164376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 18:54:30  来源:igfitidea点击:

How to extract separate text nodes with Jsoup?

javahtml-parsingjsoup

提问by M.M

I have an element like this :

我有一个这样的元素:

<td> TextA <br/> TextB </td>

How can I extract TextA and TextB separately?

如何分别提取 TextA 和 TextB?

回答by BalusC

Several ways. That really depends on the document itself and whether the given HTML markup is consistent or not. In this particular example you could get the td's child nodes by Element#childNodes()and then test every node individually if it's a TextNodeor not.

几种方式。这实际上取决于文档本身以及给定的 HTML 标记是否一致。在此特定示例中,您可以通过 获取td的子节点Element#childNodes(),然后单独测试每个节点是否为 a TextNode

E.g.

例如

Element td = getItSomehow();

for (Node child : td.childNodes()) {
    if (child instanceof TextNode) {
        System.out.println(((TextNode) child).text());
    }
}

which results in

这导致

 TextA 
 TextB 

I think it would be nice if Jsoup offered a Element#textNodes()or something to get the child text nodes like as Element#children()does to get the child elements (which would have returned the <br />element in your example).

我认为如果 Jsoup 提供 aElement#textNodes()或其他东西来获取子文本节点,就像Element#children()获取子元素一样(这将<br />在您的示例中返回元素)会很好。