Java Jsoup 查找具有特定文本的元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25517353/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Jsoup find element with specific text
提问by tbag
I want to select an element with specific text from the HTML using JSoup. The html is
我想使用 JSoup 从 HTML 中选择具有特定文本的元素。html是
<td style="vertical-align:bottom;text-align:center;width:15%">
<div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span>
<div><b>Pantry/Catering</b>
<div>
<div style="color:#00700B;">✓ Pantry Car Avbl
<br />✓ Catering Avbl</div>
</div>
<div>
<div><span>Dinner is served after departure from NZM on 1st day.;</span>...
<br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a>
</div>
<div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee & lime juice</div>
</div>
</div><span class="corners-bottom-subtab"><span></span></span>
</div>
I want to find the div element containing the text "Pantry/Catering". I tried
我想找到包含文本“Pantry/Catering”的 div 元素。我试过
doc.select("div:contains(Pantry/Catering)").first();
But this doesnt seem to work. How can I get this element using Jsoup?
但这似乎不起作用。如何使用 Jsoup 获取此元素?
回答by Spectre
When I run your code it selects the outer div
, while I'm presuming what your looking for is the inner div
. The documentationsays that it selects the "elements that contains the specified text". In this simple html:
当我运行您的代码时,它会选择外部div
,而我假设您要寻找的是内部div
。该文件说,它选择了“包含指定文本元素”。在这个简单的 html 中:
<div><div><b>Pantry/Catering</b></div></div>
The selector div:contains(Pantry/Catering)
matches twice because both contain the text 'Pantry/Catering':
选择器div:contains(Pantry/Catering)
匹配两次,因为两者都包含文本“Pantry/Catering”:
<!-- First Match -->
<div><div><b>Pantry/Catering</b></div></div>
<!-- Second Match -->
<div><b>Pantry/Catering</b></div>
The matches are always in that order because jsoup matches from the outside. Therefore .first()
always matches the outer div
. To extract the inner div
you could use .get(1)
.
匹配始终按该顺序进行,因为 jsoup 从外部匹配。因此.first()
始终匹配外部div
. 要提取内部,div
您可以使用.get(1)
.
Extracting the inner div
in full:
完全提取内部div
:
doc.select("div:contains(Pantry/Catering)").get(1)
回答by tbag
Ok. Figured it out. Had to do something like
好的。弄清楚了。不得不做类似的事情
doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();
doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();
Thanks for the help!
谢谢您的帮助!
回答by harshainfo
This should also do the work for you:
这也应该为您完成工作:
doc.selectFirst("div:containsOwn(Pantry/Catering)").text();
Explanation:
解释:
selectFirst(selector) - Helps to avoid using select().first()
selectFirst(selector) - 有助于避免使用 select().first()
containsOwn(text) - A pseudo selector to return elements that directly contain the specified text. The text must appear in the found element, not any of its descendants in contrast with contains(text).
containsOwn(text) - 一个伪选择器,用于返回直接包含指定文本的元素。文本必须出现在找到的元素中,而不是与 contains(text) 相反的任何其后代。
来源:https: //jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-