Python 如何使用 lxml 通过文本查找元素?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14299978/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:01:00  来源:igfitidea点击:

How to use lxml to find an element by text?

pythonhtmllxml

提问by user1973386

Assume we have the following html:

假设我们有以下 html:

<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>

How do I make it find the element "a", which contains "TEXT A"?

如何让它找到包含“TEXT A”的元素“a”?

So far I've got:

到目前为止,我有:

root = lxml.hmtl.document_fromstring(the_html_above)
e = root.find('.//a')

I've tried:

我试过了:

e = root.find('.//a[@text="TEXT A"]')

but that didn't work, as the "a" tags have no attribute "text".

但这不起作用,因为“a”标签没有“text”属性。

Is there any way I can solve this in a similar fashion to what I've tried?

有什么办法可以以与我尝试过的类似的方式解决这个问题吗?

采纳答案by unutbu

You are very close. Use text()=rather than @text(which indicates an attribute).

你很亲近。使用text()=而不是@text(表示一个属性)。

e = root.xpath('.//a[text()="TEXT A"]')

Or, if you know only that the text contains "TEXT A",

或者,如果您只知道文本包含“TEXT A”,

e = root.xpath('.//a[contains(text(),"TEXT A")]')

Or, if you know only that text starts with "TEXT A",

或者,如果您只知道文本以“TEXT A”开头,

e = root.xpath('.//a[starts-with(text(),"TEXT A")]')

See the docsfor more on the available string functions.

有关可用字符串函数的更多信息,请参阅文档



For example,

例如,

import lxml.html as LH

text = '''\
<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>'''

root = LH.fromstring(text)
e = root.xpath('.//a[text()="TEXT A"]')
print(e)

yields

产量

[<Element a at 0xb746d2cc>]

回答by ToonAlfrink

Another way that looks more straightforward to me:

另一种对我来说看起来更直接的方法:

results = []
root = lxml.hmtl.fromstring(the_html_above)
for tag in root.iter():
    if "TEXT A" in tag.text
        results.append(tag)