python 查找具有 minidom 属性的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2415115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:36:20  来源:igfitidea点击:

Find element with attribute with minidom

pythonxmlminidom

提问by xster

Given

给定的

<field name="frame.time_delta_displayed" showname="Time delta from previous displayed frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.time_relative" showname="Time since reference or first frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.number" showname="Frame Number: 2" size="0" pos="0" show="2"/>
<field name="frame.pkt_len" showname="Packet Length: 1506 bytes" hide="yes" size="0" pos="0" show="1506"/>
<field name="frame.len" showname="Frame Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.cap_len" showname="Capture Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.protocols" showname="Protocols in frame: eth:ip:tcp:http:data" size="0" pos="0" show="eth:ip:tcp:http:data"/>

How do I get the field with name="frame.len" right away without iterating through every tag and checking the attributes?

如何在不遍历每个标签并检查属性的情况下立即获取 name="frame.len" 字段?

回答by Tim Pietzcker

I don't think you can.

我不认为你可以。

From the parent element, you need to

从父母element那里,你需要

for subelement in element.GetElementsByTagName("field"):
    if subelement.hasAttribute("frame.len"):
        do_something()

Reacting to your comment from March 11, if the structure of your documents is stable and free of nasty surprises (like angle brackets inside attributes), you might want to try the unthinkable and use a regular expression. This is not recommended practice but could work and be much easier than actually parsing the file. I admit that I've done that sometimes myself. Haven't gone blind yet.

回应您 3 月 11 日的评论,如果您的文档结构稳定且没有令人讨厌的意外(如属性内的尖括号),您可能想尝试不可想象的并使用正则表达式。这不是推荐的做法,但可以工作并且比实际解析文件要容易得多。我承认我自己有时也这样做过。还没瞎

So in your case you could (assuming that a <field>tag doesn't span multiple lines):

因此,在您的情况下,您可以(假设<field>标签不跨越多行):

xmlfile = open("myfile.xml")
for line in xmlfile:
    match = re.search(r'<field\s+name="frame.len"\s+([^>]+)/>', line):
    if match:
        result = match.group(1)
        do_something(result)

If a <field>tag canspan multiple lines, you could try loading the entire file as plain text into memory and then scan it for matches:

如果<field>标签可以跨越多行,您可以尝试将整个文件作为纯文本加载到内存中,然后扫描它以查找匹配项:

filedump = open("myfile.xml").read()
for match in re.finditer(r'<field\s+name="frame.len"\s+([^>]+)/>', filedump):
    result = match.group(1)
    do_something(result)

In both cases, resultwill contain the attributes other than frame.len. The regex assumes that frame.lenis always the first attribute inside the tag.

在这两种情况下,result将包含除frame.len. 正则表达式假定frame.len始终是标签内的第一个属性。

回答by Alex Martelli

You don't -- the DOM API, somewhat poorly designed (by w3c, not by Python!-) doesn't have such a search function to do the iteration for you. Either accept the need to loop (not through everytag in general, but through all with a given tag name), or upgrade to a richer interface, such as BeautifulSoupor lxml.

你没有——DOM API设计得有些糟糕(由 w3c,而不是 Python!-)没有这样的搜索功能来为你进行迭代。要么接受循环的需要(通常不是遍历每个标签,而是遍历所有具有给定标签名称的标签),要么升级到更丰富的界面,例如BeautifulSouplxml

回答by Rápli András

Wow, that regex is horrible! As of 2016, there is a .getAttribute()method for each DOMElementthat makes things a bit easier, but you still have to iterate through the elements.

哇,那个正则表达式太可怕了!截至 2016 年,.getAttribute()每种方法都有一种方法DOMElement可以让事情变得更容易一些,但您仍然需要遍历元素。

l = []
for e in elements:
    if e.hasAttribute('name') and e.getAttribute('name') == 'field.len':
        l.append(e)