使用lxml从python中的xml中删除命名空间和前缀

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18159221/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:00:28  来源:igfitidea点击:

Remove namespace and prefix from xml in python using lxml

pythonxmlnamespaceslxml

提问by speedyrazor

I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:

我有一个 xml 文件,我需要打开并对其进行一些更改,其中一项更改是删除命名空间和前缀,然后保存到另一个文件。这是xml:

<?xml version='1.0' encoding='UTF-8'?>
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data</provider>
  <language>en-GB</language>
</package>

I can make the other changes I need, but can't find out how to remove the namespace and prefix. This is the reusklt xml I need:

我可以进行我需要的其他更改,但无法找到如何删除命名空间和前缀。这是我需要的 reusklt xml:

<?xml version='1.0' encoding='UTF-8'?>
<package>
  <provider>some data</provider>
  <language>en-GB</language>
</package>

And here is my script which will open and parse the xml and save it:

这是我的脚本,它将打开并解析 xml 并保存它:

metadata = '/Users/user1/Desktop/Python/metadata.xml'
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
open(metadata)
tree = etree.parse(metadata, parser)
root = tree.getroot()
tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')

So how would I add code in my script which will remove the namespace and prefix?

那么我将如何在我的脚本中添加代码来删除命名空间和前缀?

采纳答案by falsetru

Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.

按照 Uku Loskit 的建议替换标签。除此之外,使用lxml.objectify.deannotate

from lxml import etree, objectify

metadata = '/Users/user1/Desktop/Python/metadata.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()

####    
for elem in root.getiterator():
    if not hasattr(elem.tag, 'find'): continue  # (1)
    i = elem.tag.find('}')
    if i >= 0:
        elem.tag = elem.tag[i+1:]
objectify.deannotate(root, cleanup_namespaces=True)
####

tree.write('/Users/user1/Desktop/Python/done.xml',
           pretty_print=True, xml_declaration=True, encoding='UTF-8')

UPDATE

更新

Some tags like Commentreturn a function when accessing tagattribute. added a guard for that. (1)

有些标签Comment在访问tag属性时会返回一个函数。为此添加了一个警卫。(1)

回答by Uku Loskit

import xml.etree.ElementTree as ET
def remove_namespace(doc, namespace):
    """Remove namespace in the passed document in place."""
    ns = u'{%s}' % namespace
    nsl = len(ns)
    for elem in doc.getiterator():
        if elem.tag.startswith(ns):
            elem.tag = elem.tag[nsl:]

metadata = '/Users/user1/Desktop/Python/metadata.xml'
tree = ET.parse(metadata)
root = tree.getroot()

remove_namespace(root, u'http://apple.com/itunes/importer')
tree.write('/Users/user1/Desktop/Python/done.xml',
       pretty_print=True, xml_declaration=True, encoding='UTF-8')

Used a snippet of code from hereThis method could be easily extended to delete any namespace attributes by searching for tags that begin with "xmlns"

使用这里的一段代码 通过搜索以“xmlns”开头的标签,可以轻松扩展此方法以删除任何名称空间属性

回答by kmonsoor

all you need to do is:

您需要做的就是:

objectify.deannotate(root, cleanup_namespaces=True)

after you have get the root, by using root = tree.getroot()

获得root后,使用 root = tree.getroot()

回答by Bruce

Here are two other ways of removing namespaces. The first uses the lxml.etree.QName helper while the second uses regexes. Both functions allow an optional list of namespaces to match against. If no namespace list is supplied then all namespaces are removed. Attribute keys are also cleaned.

这是删除命名空间的另外两种方法。第一个使用 lxml.etree.QName 助手,而第二个使用正则表达式。这两个函数都允许匹配一个可选的命名空间列表。如果未提供命名空间列表,则删除所有命名空间。属性键也被清理。

from lxml import etree
import re

def remove_namespaces_qname(doc, namespaces=None):

    for el in doc.getiterator():

        # clean tag
        q = etree.QName(el.tag)
        if q is not None:
            if namespaces is not None:
                if q.namespace in namespaces:
                    el.tag = q.localname
            else:
                el.tag = q.localname

            # clean attributes
            for a, v in el.items():
                q = etree.QName(a)
                if q is not None:
                    if namespaces is not None:
                        if q.namespace in namespaces:
                            del el.attrib[a]
                            el.attrib[q.localname] = v
                    else:
                        del el.attrib[a]
                        el.attrib[q.localname] = v
    return doc


def remove_namespace_re(doc, namespaces=None):

    if namespaces is not None:
        ns = list(map(lambda n: u'{%s}' % n, namespaces))

    for el in doc.getiterator():

        # clean tag
        m = re.match(r'({.+})(.+)', el.tag)
        if m is not None:
            if namespaces is not None:
                if m.group(1) in ns:
                    el.tag = m.group(2)
            else:
                el.tag = m.group(2)

            # clean attributes
            for a, v in el.items():
                m = re.match(r'({.+})(.+)', a)
                if m is not None:
                    if namespaces is not None:
                        if m.group(1) in ns:
                            del el.attrib[a]
                            el.attrib[m.group(2)] = v
                    else:
                        del el.attrib[a]
                        el.attrib[m.group(2)] = v
    return doc

回答by SergiyKolesnikov

First, use lxml.etree.QNameto remove namespace prefixes from the tag names:

首先,使用lxml.etree.QName从标签名称中删除命名空间前缀:

>>> root.tag
'{http://apple.com/itunes/importer}package'
>>> etree.QName(root).localname
'package'

Afterwords, use lxml.etree.cleanup_namespaces()to remove unused namespace declarations from the tree.

后记,用于lxml.etree.cleanup_namespaces()从树中删除未使用的命名空间声明。

Putting it all together:

把它们放在一起:

from lxml import etree

input_xml = '''
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data</provider>
  <language>en-GB</language>
</package>
'''
root = etree.fromstring(input_xml)

# Remove namespace prefixes
for elem in root.getiterator():
    elem.tag = etree.QName(elem).localname
# Remove unused namespace declarations
etree.cleanup_namespaces(root)

print(etree.tostring(root).decode())

Output XML:

输出 XML:

<package>
  <provider>some data</provider>
  <language>en-GB</language>
</package>

回答by Daniel Haley

You could also use XSLT to strip the namespaces...

您还可以使用 XSLT 剥离命名空间...

XSLT 1.0(test.xsl)

XSLT 1.0(test.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*" priority="1">
    <xsl:element name="{local-name()}" namespace="">
      <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:attribute name="{local-name()}" namespace="">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>

</xsl:stylesheet>

Python

Python

from lxml import etree

tree = etree.parse("metadata.xml")
xslt = etree.parse("test.xsl")

new_tree = tree.xslt(xslt)

print(etree.tostring(new_tree, pretty_print=True, xml_declaration=True, 
                     encoding="UTF-8").decode("UTF-8"))

Output

输出

<?xml version='1.0' encoding='UTF-8'?>
<package>
  <provider>some data</provider>
  <language>en-GB</language>
</package>

回答by lechat

you can try:

你可以试试:

# Remove namespace prefixes
for elem in root.getiterator():
    namespace_removed = elem.xpath('local-name()')