xml 使用 XPATH 搜索包含的文本

Question

提问by Bergeroy

I use XPather Browserto check my XPATH expressions on an HTML page.

我使用XPather 浏览器在 HTML 页面上检查我的 XPATH 表达式。

My end goal is to use these expressions in Selenium for the testing of my user interfaces.

我的最终目标是在 Selenium 中使用这些表达式来测试我的用户界面。

I got an HTML file with a content similar to this:

我得到了一个内容与此类似的 HTML 文件：

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

I want to select a node with a text containing the string " ".

我想选择一个文本包含字符串“  ”的节点。

With a normal string like "abc" there is no problem. I use an XPATH similar to //td[text()="abc"].

使用像“abc”这样的普通字符串没有问题。我使用类似于//td[text()="abc"].

When I try with an an XPATH like //td[text()=" "]it returns nothing. Is there a special rule concerning texts with "&" ?

当我尝试使用 XPATH 时，//td[text()=" "]它什么都不返回。对于带有“ &”的文本是否有特殊规则？

Answer 1

采纳答案by Bergeroy

It seems that OpenQA, guys behind Selenium, have already addressed this problem. They defined some variables to explicitely match whitespaces. In my case, I need to use an XPATH similar to //td[text()="${nbsp}"].

看来，Selenium 背后的人OpenQA已经解决了这个问题。他们定义了一些变量来显式匹配空格。就我而言，我需要使用类似于//td[text()="${nbsp}"].

I reproduced here the text from OpenQA concerning this issue (found here):

我在此处复制了 OpenQA 中有关此问题的文本（在此处找到）：

HTML automatically normalizes whitespace within elements, ignoring leading/trailing spaces and converting extra spaces, tabs and newlines into a single space. When Selenium reads text out of the page, it attempts to duplicate this behavior, so you can ignore all the tabs and newlines in your HTML and do assertions based on how the text looks in the browser when rendered. We do this by replacing all non-visible whitespace (including the non-breaking space " ") with a single space. All visible newlines (<br>, <p>, and <pre>formatted new lines) should be preserved.
We use the same normalization logic on the text of HTML Selenese test case tables. This has a number of advantages. First, you don't need to look at the HTML source of the page to figure out what your assertions should be; " " symbols are invisible to the end user, and so you shouldn't have to worry about them when writing Selenese tests. (You don't need to put " " markers in your test case to assertText on a field that contains " ".) You may also put extra newlines and spaces in your Selenese <td>tags; since we use the same normalization logic on the test case as we do on the text, we can ensure that assertions and the extracted text will match exactly.
This creates a bit of a problem on those rare occasions when you really want/need to insert extra whitespace in your test case. For example, you may need to type text in a field like this: "foo". But if you simply write <td>foo </td>in your Selenese test case, we'll replace your extra spaces with just one space.
This problem has a simple workaround. We've defined a variable in Selenese, ${space}, whose value is a single space. You can use ${space}to insert a space that won't be automatically trimmed, like this: <td>foo${space}${space}${space}</td>. We've also included a variable ${nbsp}, that you can use to insert a non-breaking space.
Note that XPaths do notnormalize whitespace the way we do. If you need to write an XPath like //div[text()="hello world"]but the HTML of the link is really "hello world", you'll need to insert a real " " into your Selenese test case to get it to match, like this: //div[text()="hello${nbsp}world"].

HTML 自动规范元素内的空白，忽略前导/尾随空格并将额外的空格、制表符和换行符转换为单个空格。当 Selenium 从页面中读取文本时，它会尝试复制此行为，因此您可以忽略 HTML 中的所有制表符和换行符，并根据文本在浏览器中呈现时的外观进行断言。我们通过用一个空格替换所有不可见的空格（包括不间断空格“  ”）来做到这一点。应保留所有可见的换行符（<br>、<p>和<pre>格式化的新行）。
我们对 HTML Selenese 测试用例表的文本使用相同的规范化逻辑。这有许多优点。首先，您不需要查看页面的 HTML 源代码来确定您的断言应该是什么；"  " 符号对最终用户是不可见的，因此您在编写 Selenese 测试时不必担心它们。（你并不需要把“  ”标记在你的测试用例来assertText在包含字段“  ”。）您也可以把多余的换行符和空格在您的Selenese <td>标签; 因为我们在测试用例上使用与文本相同的规范化逻辑，所以我们可以确保断言和提取的文本完全匹配。
当您真的想要/需要在测试用例中插入额外的空格时，这会在极少数情况下产生一些问题。例如，您可能需要在这样的字段中键入文本：“ foo”。但是如果你只是<td>foo </td>在你的 Selenese 测试用例中写，我们会用一个空格替换你多余的空格。
这个问题有一个简单的解决方法。我们在 Selenese 中定义了一个变量 ${space}，它的值是一个空格。您可以使用${space}插入一个不会自动修剪的空格，如下所示： <td>foo${space}${space}${space}</td>。我们还包含了一个变量 ${nbsp}，您可以使用它来插入一个不间断的空格。
请注意，XPath不会像我们那样规范化空格。如果你需要写的XPath一样 //div[text()="hello world"]，但链接的HTML真的是“ hello world”，你需要插入一个真正的“  ”到你的Selenese测试案例得到它来搭配，这样的： //div[text()="hello${nbsp}world"]。

Answer 2

回答by PhiLho

I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

我发现当我通过在 Windows 上的两个引号之间键入 Alt+0160 输入硬编码的不间断空格 (U+00A0) 时，我可以进行匹配...

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

使用特殊字符对我来说有效。

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

据我了解，XPath 1.0 标准不处理转义 Unicode 字符。XPath 2.0 中似乎有相关功能，但看起来 Firefox 不支持它（或者我误解了一些东西）。所以你必须使用本地代码页。丑，我知道。

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.

实际上，看起来标准依赖于使用 XPath 的编程语言来提供正确的 Unicode 转义序列......所以，不知何故，我做了正确的事情。

Answer 3

回答by James Sulak

Try using the decimal entity  instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking spaceinstead of the  entity.

尝试使用十进制实体 而不是命名实体。如果这不起作用，您应该能够简单地将unicode 字符用于不间断空格而不是 实体。

(Note: I did not try this in XPather, but I did try it in Oxygen.)

（注意：我没有在 XPather 中尝试过，但我确实在 Oxygen 中尝试过。）

Answer 4

回答by ChuckB

Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&, >, <, ', ") with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter  in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.

请记住，在计算 XPath 表达式时，符合标准的 XML 处理器将用目标编码中的相应字符替换除 XML 的五个标准引用（&、>、<、'、"）之外的任何实体引用。鉴于这种行为，如果您想使用 XML 工具，PhiLho 和 jsulak 的建议是您的最佳选择。当您输入 XPath 表达式时，应在应用 XPath 表达式之前将其转换为相应的字节序列。

Answer 5

回答by DebanjanB

As per the HTML you have provided:

根据您提供的 HTML：

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

To locate the node with the string  you can use either of the following xpathbased solutions:

要使用字符串定位节点， 您可以使用以下基于xpath的解决方案之一：

Using text():
```
"//td[text()='\u00A0']"
```
Using contains():
```
"//td[contains(., '\u00A0')]"
```

使用text()：
```
"//td[text()='\u00A0']"
```
使用contains()：
```
"//td[contains(., '\u00A0')]"
```

However, ideally you may like to avoid the NO-BREAK SPACEcharacter and use either of the following Locator Strategies:

但是，理想情况下，您可能希望避免使用NO-BREAK SPACE字符并使用以下任一定位器策略：

Using the parent <tr>node and following-sibling:
```
"//tr//following-sibling::td[2]"
```
Using starts-with():
```
"//tr//td[last()]"
```
Using the preceeding <td>node and followingnode andfollowing-sibling`:
```
"//td[text()='abc']//following::td[1]"
```

使用父<tr>节点和following-sibling：
```
"//tr//following-sibling::td[2]"
```
使用starts-with()：
```
"//tr//td[last()]"
```
使用前面的<td>节点和followingnode and后面的兄弟节点：
```
"//td[text()='abc']//following::td[1]"
```

Reference

参考

You can find a relevant detailed discussion in:

您可以在以下位置找到相关的详细讨论：

How to find an element which contains  using Selenium

如何查找包含 使用硒的元素

tl; dr

tl; 博士

Unicode Character 'NO-BREAK SPACE' (U+00A0)

Unicode 字符 'NO-BREAK SPACE' (U+00A0)

Answer 6

回答by Zack The Human

I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

我无法使用 Xpather 进行匹配，但以下内容适用于 Microsoft 的 XML 记事本中的纯 XML 和 XSL 文件：

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

返回的值为 1，这是我的测试用例中的正确值。

However, I did have to declare nbspas an entity within my XML and XSL using the following:

但是，我确实必须使用以下内容将nbsp声明为我的 XML 和 XSL 中的实体：

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actuallyfind nbspusing an XPath expression.

我不确定这是否对您有帮助，但我实际上能够使用 XPath 表达式找到nbsp。

Edit: My code sample actually contains the characters ' 'but the JavaScript syntax highlight converts it to the space character. Don't be mislead!

编辑：我的代码示例实际上包含字符' ' 但 JavaScript 语法高亮将其转换为空格字符。不要被误导！

Answer 7

回答by Zack The Human

Search for  or only nbsp- did you try this?

搜索 or only nbsp- 你试过这个吗？

Answer 8

回答by Raghwendra Sonu

You can use, XPath Contains, Sibling, Ancestor Functions in Selenium WebDriver to locate elements not having any unique properties to get identified.

您可以使用 Selenium WebDriver 中的 XPath Contains、Sibling、Ancestor Functions 来定位没有任何唯一属性的元素以进行识别。

for more details, read this page: https://www.guru99.com/using-contains-sbiling-ancestor-to-find-element-in-selenium.html

有关更多详细信息，请阅读此页面：https: //www.guru99.com/using-contains-sbiling-ancestor-to-find-element-in-selenium.html

xml 使用 XPATH 搜索包含的文本

提问by Bergeroy

采纳答案by Bergeroy

回答by PhiLho

回答by James Sulak

回答by ChuckB

回答by DebanjanB

Reference

参考

tl; dr

tl; 博士

回答by Zack The Human

回答by Zack The Human

回答by Raghwendra Sonu

相关推荐

最近更新

标签

xml 使用 XPATH 搜索包含 的文本

提问by Bergeroy

采纳答案by Bergeroy

回答by PhiLho

回答by James Sulak

回答by ChuckB

回答by DebanjanB

Reference

参考

tl; dr

tl; 博士

回答by Zack The Human

回答by Zack The Human

回答by Raghwendra Sonu

相关推荐

如何将带有嵌套节点（父/子关系）的 XML 导入 Access？

xml 为什么我每天都会收到 DMARC 报告？

将 XML 转换为纯文本 - 我应该如何忽略/处理 XSLT 中的空格？

spring security 4 csrf 通过 xml 禁用

相关推荐

最近更新

标签

xml 使用 XPATH 搜索包含的文本