java 使用 dom4j 清理命名空间处理

Question

提问by Antoine Claval

We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.

我们使用 dom4j 1.6.1 来解析来自某处的 XML。有时，balise 提到了命名空间（例如：），有时则没有（）。它调用 Element.selectSingleNode(String s ) 失败。

For now we have 3 solutions, and we are not happy with them

目前我们有 3 个解决方案，我们对它们不满意

1 - Remove all namespace occurence before doing anything with the xml document

1 - 在对 xml 文档做任何事情之前删除所有命名空间出现

xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace

2 - Remove namespace just before getting a node By calling

2 - 在获取节点之前删除命名空间通过调用

Element.remove(Namespace ns)

But it's works only for a node and the first level of child

但它仅适用于节点和第一级子节点

3 - Clutter the code by

3 - 使代码混乱

node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
    node = rootElement.selectSingleNode(NameWithNameSpace)

So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?

所以你怎么看？魔女越少越差？你有其他的解决方案吗？

Answer 1

回答by mestachs

I wanted to remove any namespace information(declaration and tag) to ease the xpath evaluation. I end up with this solution :

我想删除任何命名空间信息（声明和标记）以简化 xpath 评估。我最终得到了这个解决方案：

String xml = ...
SAXReader reader = new SAXReader();
Document document = reader.read(new ByteArrayInputStream(xml.getBytes()));
document.accept(new NameSpaceCleaner());
return document.asXML();

where the NameSpaceCleaner is a dom4j visitor :

其中 NameSpaceCleaner 是 dom4j 访问者：

private static final class NameSpaceCleaner extends VisitorSupport {
    public void visit(Document document) {
        ((DefaultElement) document.getRootElement())
                .setNamespace(Namespace.NO_NAMESPACE);
        document.getRootElement().additionalNamespaces().clear();
    }
    public void visit(Namespace namespace) {
        namespace.detach();
    }
    public void visit(Attribute node) {
       if (node.toString().contains("xmlns")
        || node.toString().contains("xsi:")) {
        node.detach();
      }
    }

    public void visit(Element node) {
        if (node instanceof DefaultElement) {
        ((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
        }
         }
 }

Answer 2

回答by Abhishek

Following is some code that i had found and now use. Might be useful, if looking for a generic way, to remove all namespaces from a dom4j document.

以下是我找到并现在使用的一些代码。如果正在寻找一种通用方法，从 dom4j 文档中删除所有名称空间可能很有用。

    public static void removeAllNamespaces(Document doc) {
        Element root = doc.getRootElement();
        if (root.getNamespace() !=
                Namespace.NO_NAMESPACE) {            
                removeNamespaces(root.content());
        }
    }

    public static void unfixNamespaces(Document doc, Namespace original) {
        Element root = doc.getRootElement();
        if (original != null) {
            setNamespaces(root.content(), original);
        }
    }

    public static void setNamespace(Element elem, Namespace ns) {

        elem.setQName(QName.get(elem.getName(), ns,
                elem.getQualifiedName()));
    }

    /**
     *Recursively removes the namespace of the element and all its
    children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(Element elem) {
        setNamespaces(elem, Namespace.NO_NAMESPACE);
    }

    /**
     *Recursively removes the namespace of the list and all its
    children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(List l) {
        setNamespaces(l, Namespace.NO_NAMESPACE);
    }

    /**
     *Recursively sets the namespace of the element and all its children.
     */
    public static void setNamespaces(Element elem, Namespace ns) {
        setNamespace(elem, ns);
        setNamespaces(elem.content(), ns);
    }

    /**
     *Recursively sets the namespace of the List and all children if the
    current namespace is match
     */
    public static void setNamespaces(List l, Namespace ns) {
        Node n = null;
        for (int i = 0; i < l.size(); i++) {
            n = (Node) l.get(i);

            if (n.getNodeType() == Node.ATTRIBUTE_NODE) {
                ((Attribute) n).setNamespace(ns);
            }
            if (n.getNodeType() == Node.ELEMENT_NODE) {
                setNamespaces((Element) n, ns);
            }            
        }
    }

Hope this is useful for someone who needs it!

希望这对需要它的人有用！

Answer 3

回答by Jherico

Option 1 is dangerous because you can't guarantee the prefixes for a given namespace without pre-parsing the document, and because you can end up with namespace collision. If you're consuming a document and not outputting anything, it might be ok, depending on the source of the doc, but otherwise it just loses too much information.

选项 1 很危险，因为您无法在不预先解析文档的情况下保证给定命名空间的前缀，并且最终可能会发生命名空间冲突。如果您正在使用文档而不输出任何内容，这可能没问题，具体取决于文档的来源，否则它只会丢失太多信息。

Option 2 could be applied recursively but its got many of the same problems as option 1.

选项 2 可以递归应用，但它有许多与选项 1 相同的问题。

Option 3 sounds like the best approach, but rather than clutter your code, make a static method that does both checks rather than putting the same if statement throughout your codebase.

选项 3 听起来是最好的方法，但与其使代码混乱，不如创建一个静态方法来执行这两项检查，而不是在整个代码库中放置相同的 if 语句。

The best approach is to get whoever is sending you the bad XML to fix it. Of course this begs the question is it actually broken. Specifically, are you getting XML where the default namespace is defined as X and then a namespace also representing X is given a prefix of 'es'? If this is the case then the XML is well formed and you just need code that is agnostic about the prefix, but still uses a qualified name to fetch the element. I'm not familiar enough with Dom4j to know if creating a Namespace with a null prefix will cause it to match all elements with a matching URI or only those with no prefix, but its worth experimenting with.

最好的方法是让向您发送错误 XML 的人修复它。当然，这引出了一个问题，它实际上是坏的。具体来说，您是否获得了 XML，其中默认命名空间定义为 X，然后同样代表 X 的命名空间被赋予了前缀“es”？如果是这种情况，则 XML 格式良好，您只需要与前缀无关的代码，但仍使用限定名称来获取元素。我对 Dom4j 不够熟悉，不知道创建带有空前缀的命名空间是否会导致它匹配具有匹配 URI 的所有元素或仅匹配那些没有前缀的元素，但它值得尝试。

Answer 4

回答by vdr

As Abhishek, I needed to strip the namespace from XML to simplify XPath queries in system testing scripts. (the XML is first XSD validated)

作为 Abhishek，我需要从 XML 中剥离命名空间以简化系统测试脚本中的 XPath 查询。（XML 首先经过 XSD 验证）

Here are the problems I faced:

以下是我遇到的问题：

I needed to process deeply structured XML that had a tendency of blowing up the stack.
On most complex XML, for a reason I didn't investigate fully, stripping all the namespaces only worked in reliably when traversing the DOM tree depth first. So that excluded the visitor, or getting the list of nodes with document.selectNodes("//*")

我需要处理深度结构化的 XML，而这些 XML 有炸毁堆栈的倾向。
在大多数复杂的 XML 上，由于我没有完全调查的原因，剥离所有命名空间只有在首先遍历 DOM 树深度时才能可靠地工作。这样就排除了访问者，或者使用以下方法获取节点列表document.selectNodes("//*")

I ended up with the following (not the most elegant, but if that can help solving somebody's problem ...):

我最终得到了以下内容（不是最优雅的，但如果这可以帮助解决某人的问题......）：

public static String normaliseXml(final String message) {
    org.dom4j.Document document;
    document = DocumentHelper.parseText(message);

    Queue stack = new LinkedList();

    Object current = document.getRootElement();

    while (current != null) {
        if (current instanceof Element) {
            Element element = (Element) current;

            Iterator iterator = element.elementIterator();

            if (iterator.hasNext()) {
                stack.offer(element);
                current = iterator;
            } else {
                stripNamespace(element);

                current = stack.poll();
            }
        } else {
            Iterator iterator = (Iterator) current;

            if (iterator.hasNext()) {
                stack.offer(iterator);
                current = iterator.next();
            } else {
                current = stack.poll();

                if (current instanceof Element) {
                    stripNamespace((Element) current);

                    current = stack.poll();
                }
            }
        }
    }

    return document.asXML();
}

private static void stripNamespace(Element element) {
    QName name = new QName(element.getName(), Namespace.NO_NAMESPACE, element.getName());
    element.setQName(name);

    for (Object o : element.attributes()) {
        Attribute attribute = (Attribute) o;

        QName attributeName = new QName(attribute.getName(), Namespace.NO_NAMESPACE, attribute.getName());
        String attributeValue = attribute.getValue();

        element.remove(attribute);

        element.addAttribute(attributeName, attributeValue);
    }

    for (Object o : element.declaredNamespaces()) {
        Namespace namespace = (Namespace) o;
        element.remove(namespace);
    }
}

Answer 5

回答by user2368526

This code actually works:

这段代码实际上有效：

public void visit(Document document) {
    ((DefaultElement) document.getRootElement())
            .setNamespace(Namespace.NO_NAMESPACE);
    document.getRootElement().additionalNamespaces().clear();
}

public void visit(Namespace namespace) {
    if (namespace.getParent() != null) {
        namespace.getParent().remove(namespace);
    }
}

public void visit(Attribute node) {
    if (node.toString().contains("xmlns")
            || node.toString().contains("xsi:")) {
        node.getParent().remove(node);
    }
}

public void visit(Element node) {
    if (node instanceof DefaultElement) {
        ((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
        node.additionalNamespaces().clear();
    }
}

java 使用 dom4j 清理命名空间处理

提问by Antoine Claval

回答by mestachs

回答by Abhishek

回答by Jherico

回答by vdr

回答by user2368526

相关推荐

最近更新

标签

java 使用 dom4j 清理命名空间处理

提问by Antoine Claval

回答by mestachs

回答by Abhishek

回答by Jherico

回答by vdr

回答by user2368526

相关推荐

java 不同范围的依赖内的Maven依赖

java 什么是 JAX-WS 拦截器（也称为处理程序）？

java 春季MVC；避免在 url 中使用文件扩展名？

java SQLException.getSQLState 的所有可能值是什么？

相关推荐

最近更新

标签