java 如何使用 Jsoup 提取单独的文本节点?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7164376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract separate text nodes with Jsoup?
提问by M.M
I have an element like this :
我有一个这样的元素:
<td> TextA <br/> TextB </td>
How can I extract TextA and TextB separately?
如何分别提取 TextA 和 TextB?
回答by BalusC
Several ways. That really depends on the document itself and whether the given HTML markup is consistent or not. In this particular example you could get the td
's child nodes by Element#childNodes()
and then test every node individually if it's a TextNode
or not.
几种方式。这实际上取决于文档本身以及给定的 HTML 标记是否一致。在此特定示例中,您可以通过 获取td
的子节点Element#childNodes()
,然后单独测试每个节点是否为 a TextNode
。
E.g.
例如
Element td = getItSomehow();
for (Node child : td.childNodes()) {
if (child instanceof TextNode) {
System.out.println(((TextNode) child).text());
}
}
which results in
这导致
TextA TextB
I think it would be nice if Jsoup offered a Element#textNodes()
or something to get the child text nodes like as Element#children()
does to get the child elements (which would have returned the <br />
element in your example).
我认为如果 Jsoup 提供 aElement#textNodes()
或其他东西来获取子文本节点,就像Element#children()
获取子元素一样(这将<br />
在您的示例中返回元素)会很好。