python 如何匹配文本节点然后使用 XPath 跟随父节点

Question

提问by Mat

I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant contentnode.

我正在尝试使用 XPath 解析一些 HTML。按照下面的简化 XML 示例，我想匹配字符串 'Text 1'，然后获取相关content节点的内容。

<doc>
    <block>
        <title>Text 1</title>
        <content>Stuff I want</content>
    </block>

    <block>
        <title>Text 2</title>
        <content>Stuff I don't want</content>
    </block>
</doc>

My Python code throws a wobbly:

我的 Python 代码抛出了一个不稳定的问题：

>>> from lxml import etree
>>>
>>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
I want</content></block><block><title>Text 2</title><content>Stuff I d
on't want</content></block></doc>")
>>>
>>> # get all titles
... tree.xpath('//title/text()')
['Text 1', 'Text 2']
>>>
>>> # match 'Text 1'
... tree.xpath('//title/text()="Text 1"')
True
>>>
>>> # Follow parent from selected nodes
... tree.xpath('//title/text()/../..//text()')
['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
>>>
>>> # Follow parent from selected node
... tree.xpath('//title/text()="Text 1"/../..//text()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
lxml/lxml.etree.c:14542)
  File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
ll__ (src/lxml/lxml.etree.c:90093)
  File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
e_result (src/lxml/lxml.etree.c:89446)
  File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
_eval_error (src/lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError: Invalid type

Is this possible in XPath? Do I need to express what I want to do in a different way?

这在 XPath 中可能吗？我需要以不同的方式表达我想做的事情吗？

Answer 1

回答by Johannes Weiss

Do you want that?

你想要那个吗？

//title[text()='Text 1']/../content/text()

Answer 2

回答by Dimitre Novatchev

Use:

使用：

string(/*/*/title[. = 'Text 1']/following-sibling::content)

This represents at least two improvementsas compared to the currently accepted solution of Johannes Wei?:

与目前公认的 Johannes Wei? 解决方案相比，这至少代表了两个改进：

The very expensive abbreviation "//"(usually causing the whole XML document to be scanned) is avoidedas it should be whenever the structure of the XML document is known in advance.
There is no return back to the parent(the location step "/.." is avoided)

避免使用非常昂贵的缩写“//”（通常会导致扫描整个 XML 文档），因为只要事先知道 XML 文档的结构，就应该这样做。
没有返回到父级（避免了位置步骤“/..”）

python 如何匹配文本节点然后使用 XPath 跟随父节点

提问by Mat

回答by Johannes Weiss

回答by Dimitre Novatchev

相关推荐

最近更新

标签

python 如何匹配文本节点然后使用 XPath 跟随父节点

提问by Mat

回答by Johannes Weiss

回答by Dimitre Novatchev

相关推荐

python @classmethod 中的“self”指的是什么？

在 python 2.4 中，如何使用 csh 而不是 bash 执行外部命令？

python 远程执行任意python代码 - 可以做到吗？

Python、Sqlite3 - 如何将列表转换为 BLOB 单元格

相关推荐

最近更新

标签