python 如何匹配文本节点然后使用 XPath 跟随父节点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/598722/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:25:47  来源:igfitidea点击:

How to match a text node then follow parent nodes using XPath

pythonhtmlxpathlxml

提问by Mat

I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant contentnode.

我正在尝试使用 XPath 解析一些 HTML。按照下面的简化 XML 示例,我想匹配字符串 'Text 1',然后获取相关content节点的内容。

<doc>
    <block>
        <title>Text 1</title>
        <content>Stuff I want</content>
    </block>

    <block>
        <title>Text 2</title>
        <content>Stuff I don't want</content>
    </block>
</doc>

My Python code throws a wobbly:

我的 Python 代码抛出了一个不稳定的问题:

>>> from lxml import etree
>>>
>>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
I want</content></block><block><title>Text 2</title><content>Stuff I d
on't want</content></block></doc>")
>>>
>>> # get all titles
... tree.xpath('//title/text()')
['Text 1', 'Text 2']
>>>
>>> # match 'Text 1'
... tree.xpath('//title/text()="Text 1"')
True
>>>
>>> # Follow parent from selected nodes
... tree.xpath('//title/text()/../..//text()')
['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
>>>
>>> # Follow parent from selected node
... tree.xpath('//title/text()="Text 1"/../..//text()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
lxml/lxml.etree.c:14542)
  File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
ll__ (src/lxml/lxml.etree.c:90093)
  File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
e_result (src/lxml/lxml.etree.c:89446)
  File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
_eval_error (src/lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError: Invalid type

Is this possible in XPath? Do I need to express what I want to do in a different way?

这在 XPath 中可能吗?我需要以不同的方式表达我想做的事情吗?

回答by Johannes Weiss

Do you want that?

你想要那个吗?

//title[text()='Text 1']/../content/text()

回答by Dimitre Novatchev

Use:

使用

string(/*/*/title[. = 'Text 1']/following-sibling::content)

This represents at least two improvementsas compared to the currently accepted solution of Johannes Wei?:

与目前公认的 Johannes Wei? 解决方案相比,这至少代表了两个改进

  1. The very expensive abbreviation "//"(usually causing the whole XML document to be scanned) is avoidedas it should be whenever the structure of the XML document is known in advance.

  2. There is no return back to the parent(the location step "/.." is avoided)

  1. 避免使用非常昂贵的缩写“//”(通常会导致扫描整个 XML 文档)因为只要事先知道 XML 文档的结构,就应该这样做。

  2. 没有返回到父级(避免了位置步骤“/..”)