xml XPath 定位具有特定文本解析 HTML 表的单元格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9643762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 15:15:24  来源:igfitidea点击:

XPath to locate a cell with specific text parsing HTML tables

xmlxpathgroovyhtmlunit

提问by David Brown

Hope someone out there can quickly point me in the right direction with my XPath difficulties.

希望有人能在我的 XPath 困难中迅速指出我正确的方向。

Current I've got to the point where I'm identifying the correct table i need in my HTML source but then I need to process only the rows that have the text 'Chapter' somewhere in the DOM.

目前我已经到了在我的 HTML 源代码中识别我需要的正确表格的地步,但随后我只需要处理在 DOM 某处具有文本“Chapter”的行。

My last attempt was to do this :

我的最后一次尝试是这样做:

// get the correct table
HtmlTable table = page.getFirstByXPath("//table[2]");

// now the failing bit....
def rows = table.getByXPath("*/td[contains(text(),'Chapter')]") 

I thought the xpath above would represent, get me all elements that have a following child element of 'td' that somewhere in its dom contains the text 'Chapter'

我认为上面的 xpath 将代表,让我所有具有以下子元素的元素 'td' 在其 dom 中的某处包含文本 'Chapter'

An example of a matching row from my source is :

我的来源中匹配行的一个示例是:

<tr valign="top">
  <td nowrap="" align="Right">
   <font face="Verdana">
   <a href="index.cfm?a=1">Chapter 1</a>
   </font>
  </td>
  <td class="ChapterT">
    <font face="Verdana">DEFINITIONS</font>
  </td>
  <td>&nbsp;</td>
</tr>

Any help / pointers greatly appreciated.

非常感谢任何帮助/指示。

Thanks,

谢谢,

回答by Kirill Polishchuk

Use this XPath:

使用这个 XPath:

//td[contains(., 'Chapter')]

回答by Dimitre Novatchev

You want all tds under your current node -- not- all in the documentas the currently accepted answer selects.

你希望所有td当前的节点下的S -不-文档中所有作为目前公认的答案选择

Use:

使用

.//td[.//text()[contains(., 'Chapter')]]

This selects all tddescendants of the current node that are named tdthat have at least one text node descendant, whose string value contains the string "Chapter".

这将选择td当前节点的所有已命名的td后代,这些后代至少具有一个文本节点后代,其字符串值包含字符串"Chapter"

If it is known in advance that any tdunder this tableonly has a single text node, this can be simplified to just:

如果事先知道tdthis 下的anytable只有一个文本节点,则可以简化为

.//td[contains(., 'Chapter')]

回答by William Walseth

Your on the right "path".
The contains() function is limited the a specific element, not text in any of the children. Try this XPath, which you could read as follows: - get every tr/td with any sub element that contains the text 'Chapter'

您走在正确的“道路”上。
contains() 函数仅限于特定元素,而不是任何子元素中的文本。试试这个 XPath,你可以阅读如下: - 使用包含文本 'Chapter' 的任何子元素获取每个 tr/td

tr/td[contains(*,"Chapter")]

Good luck

祝你好运