Python 使用 XPath 获取第二个元素文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4117953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:21:06  来源:igfitidea点击:

Get second element text with XPath?

pythonxpathlxml

提问by

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

I want to get chromeand have it working like this already.

我想得到chrome并让它像这样工作。

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.

我想将它组合成一个 XPath 表达式,只得到一个项目而不是一个列表。
我试过这个,但它不起作用。

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

And the actual, not simplified, HTML is like this.

而实际的,而不是简化的 HTML 是这样的。

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

采纳答案by Dimitre Novatchev

I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]')

我试过这个,但它不起作用。

t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the //abbreviation.

这是关于//缩写的常见问题解答

.//a[2]means: Select all adescendents of the current node that are the second achild of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

.//a[2]意思是:选择a当前节点的所有后代a,它们是其父节点的第二个子节点。因此,这可能会选择多个元素或不选择元素——这取决于具体的 XML 文档。

To put it more simply, the []operator has higher precedence than //.

更简单地说,[]运算符的优先级高于//

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

如果您只想返回所有节点中的一个(第二个),则必须使用括号来强制您想要的优先级:

(.//a)[2]

(.//a)[2]

This really selects the second adescendent of the current node.

这实际上选择a了当前节点的第二个后代。

For the actual expression used in the question, change it to:

对于问题中使用的实际表达式,将其更改为

(.//span[@class="python"]//a)[2]

or change it to:

或将其更改为:

(.//span[@class="python"]//a)[2]/text()

回答by MattH

I'm not sure what the problem is...

我不确定是什么问题...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

回答by MattH

From Comments:

来自评论:

or the simplification of the actual HTML I posted is too simple

或者我贴的实际HTML的简化太简单了

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

你是对的。是什么意思.//span[@class="python"]//a[2]?这将扩展为:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second achild (fn:position()refers to the childaxe). So, nothing will be select if your document is like:

它将最终选择第二个a孩子(fn:position()指的是child斧头)。因此,如果您的文档如下所示,则不会选择任何内容:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span> 

If you want the second of all descendants, use:

如果您想要所有后代中的第二个,请使用:

descendant::span[@class="python"]/descendant::a[2]