Python 使用 XPath 获取第二个元素文本？

Question

提问by

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

I want to get chromeand have it working like this already.

我想得到chrome并让它像这样工作。

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.

我想将它组合成一个 XPath 表达式，只得到一个项目而不是一个列表。
我试过这个，但它不起作用。

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

And the actual, not simplified, HTML is like this.

而实际的，而不是简化的 HTML 是这样的。

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

Answer 1

采纳答案by Dimitre Novatchev

I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')

我试过这个，但它不起作用。
t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the //abbreviation.

这是关于//缩写的常见问题解答。

.//a[2]means: Select all adescendents of the current node that are the second achild of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

.//a[2]意思是：选择a当前节点的所有后代a，它们是其父节点的第二个子节点。因此，这可能会选择多个元素或不选择元素——这取决于具体的 XML 文档。

To put it more simply, the []operator has higher precedence than //.

更简单地说，[]运算符的优先级高于//。

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

如果您只想返回所有节点中的一个（第二个），则必须使用括号来强制您想要的优先级：

(.//a)[2]

This really selects the second adescendent of the current node.

这实际上选择a了当前节点的第二个后代。

For the actual expression used in the question, change it to:

对于问题中使用的实际表达式，将其更改为：

(.//span[@class="python"]//a)[2]

or change it to:

或将其更改为：

(.//span[@class="python"]//a)[2]/text()

Answer 2

回答by MattH

I'm not sure what the problem is...

我不确定是什么问题...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

Answer 3

回答by MattH

From Comments:

来自评论：

or the simplification of the actual HTML I posted is too simple

或者我贴的实际HTML的简化太简单了

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

你是对的。是什么意思.//span[@class="python"]//a[2]？这将扩展为：

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second achild (fn:position()refers to the childaxe). So, nothing will be select if your document is like:

它将最终选择第二个a孩子（fn:position()指的是child斧头）。因此，如果您的文档如下所示，则不会选择任何内容：

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

If you want the second of all descendants, use:

如果您想要所有后代中的第二个，请使用：

descendant::span[@class="python"]/descendant::a[2]

Python 使用 XPath 获取第二个元素文本？

提问by

采纳答案by Dimitre Novatchev

回答by MattH

回答by MattH

相关推荐

最近更新

标签

Python 使用 XPath 获取第二个元素文本？

提问by

采纳答案by Dimitre Novatchev

回答by MattH

回答by MattH

相关推荐

Python 单元测试测试顺序

Python 在 csv 文件中查找重复项的脚本

Python 用芹菜运行“独特”的任务

如何在Python列表中查找元素的索引？

相关推荐

最近更新

标签