Java Jsoup 查找具有特定文本的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25517353/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 00:33:54  来源:igfitidea点击:

Jsoup find element with specific text

javahtmlparsingjsoup

提问by tbag

I want to select an element with specific text from the HTML using JSoup. The html is

我想使用 JSoup 从 HTML 中选择具有特定文本的元素。html是

<td style="vertical-align:bottom;text-align:center;width:15%">
<div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span>
    <div><b>Pantry/Catering</b>
        <div>
            <div style="color:#00700B;">&#10003;&nbsp;Pantry Car Avbl
                <br />&#10003;&nbsp;Catering Avbl</div>
        </div>
        <div>
            <div><span>Dinner is served after departure from NZM on 1st day.;</span>...
                <br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a>
            </div>
            <div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &amp;Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee &amp; lime juice</div>
        </div>
    </div><span class="corners-bottom-subtab"><span></span></span>
</div>

I want to find the div element containing the text "Pantry/Catering". I tried

我想找到包含文本“Pantry/Catering”的 div 元素。我试过

doc.select("div:contains(Pantry/Catering)").first();

But this doesnt seem to work. How can I get this element using Jsoup?

但这似乎不起作用。如何使用 Jsoup 获取此元素?

回答by Spectre

When I run your code it selects the outer div, while I'm presuming what your looking for is the inner div. The documentationsays that it selects the "elements that contains the specified text". In this simple html:

当我运行您的代码时,它会选择外部div,而我假设您要寻找的是内部div。该文件说,它选择了“包含指定文本元素”。在这个简单的 html 中:

<div><div><b>Pantry/Catering</b></div></div>

The selector div:contains(Pantry/Catering)matches twice because both contain the text 'Pantry/Catering':

选择器div:contains(Pantry/Catering)匹配两次,因为两者都包含文本“Pantry/Catering”:

<!-- First Match -->
<div><div><b>Pantry/Catering</b></div></div>

<!-- Second Match -->
<div><b>Pantry/Catering</b></div>

The matches are always in that order because jsoup matches from the outside. Therefore .first()always matches the outer div. To extract the inner divyou could use .get(1).

匹配始终按该顺序进行,因为 jsoup 从外部匹配。因此.first()始终匹配外部div. 要提取内部,div您可以使用.get(1).

Extracting the inner divin full:

完全提取内部div

doc.select("div:contains(Pantry/Catering)").get(1)

回答by tbag

Ok. Figured it out. Had to do something like

好的。弄清楚了。不得不做类似的事情

doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();

doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();

Thanks for the help!

谢谢您的帮助!

回答by harshainfo

This should also do the work for you:

这也应该为您完成工作:

doc.selectFirst("div:containsOwn(Pantry/Catering)").text();

Explanation:

解释:

selectFirst(selector) - Helps to avoid using select().first()

selectFirst(selector) - 有助于避免使用 select().first()

containsOwn(text) - A pseudo selector to return elements that directly contain the specified text. The text must appear in the found element, not any of its descendants in contrast with contains(text).

containsOwn(text) - 一个伪选择器,用于返回直接包含指定文本的元素。文本必须出现在找到的元素中,而不是与 contains(text) 相反的任何其后代。

Source : https://jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-

来源:https: //jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-