java 使用 dom4j 清理命名空间处理
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1422395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Clean namespace handling with dom4j
提问by Antoine Claval
We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.
我们使用 dom4j 1.6.1 来解析来自某处的 XML。有时,balise 提到了命名空间(例如:),有时则没有()。它调用 Element.selectSingleNode(String s ) 失败。
For now we have 3 solutions, and we are not happy with them
目前我们有 3 个解决方案,我们对它们不满意
1 - Remove all namespace occurence before doing anything with the xml document
1 - 在对 xml 文档做任何事情之前删除所有命名空间出现
xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace
2 - Remove namespace just before getting a node By calling
2 - 在获取节点之前删除命名空间通过调用
Element.remove(Namespace ns)
But it's works only for a node and the first level of child
但它仅适用于节点和第一级子节点
3 - Clutter the code by
3 - 使代码混乱
node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
node = rootElement.selectSingleNode(NameWithNameSpace)
So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?
所以你怎么看 ?魔女越少越差?你有其他的解决方案吗?
回答by mestachs
I wanted to remove any namespace information(declaration and tag) to ease the xpath evaluation. I end up with this solution :
我想删除任何命名空间信息(声明和标记)以简化 xpath 评估。我最终得到了这个解决方案:
String xml = ...
SAXReader reader = new SAXReader();
Document document = reader.read(new ByteArrayInputStream(xml.getBytes()));
document.accept(new NameSpaceCleaner());
return document.asXML();
where the NameSpaceCleaner is a dom4j visitor :
其中 NameSpaceCleaner 是 dom4j 访问者:
private static final class NameSpaceCleaner extends VisitorSupport {
public void visit(Document document) {
((DefaultElement) document.getRootElement())
.setNamespace(Namespace.NO_NAMESPACE);
document.getRootElement().additionalNamespaces().clear();
}
public void visit(Namespace namespace) {
namespace.detach();
}
public void visit(Attribute node) {
if (node.toString().contains("xmlns")
|| node.toString().contains("xsi:")) {
node.detach();
}
}
public void visit(Element node) {
if (node instanceof DefaultElement) {
((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
}
}
}
回答by Abhishek
Following is some code that i had found and now use. Might be useful, if looking for a generic way, to remove all namespaces from a dom4j document.
以下是我找到并现在使用的一些代码。如果正在寻找一种通用方法,从 dom4j 文档中删除所有名称空间可能很有用。
public static void removeAllNamespaces(Document doc) {
Element root = doc.getRootElement();
if (root.getNamespace() !=
Namespace.NO_NAMESPACE) {
removeNamespaces(root.content());
}
}
public static void unfixNamespaces(Document doc, Namespace original) {
Element root = doc.getRootElement();
if (original != null) {
setNamespaces(root.content(), original);
}
}
public static void setNamespace(Element elem, Namespace ns) {
elem.setQName(QName.get(elem.getName(), ns,
elem.getQualifiedName()));
}
/**
*Recursively removes the namespace of the element and all its
children: sets to Namespace.NO_NAMESPACE
*/
public static void removeNamespaces(Element elem) {
setNamespaces(elem, Namespace.NO_NAMESPACE);
}
/**
*Recursively removes the namespace of the list and all its
children: sets to Namespace.NO_NAMESPACE
*/
public static void removeNamespaces(List l) {
setNamespaces(l, Namespace.NO_NAMESPACE);
}
/**
*Recursively sets the namespace of the element and all its children.
*/
public static void setNamespaces(Element elem, Namespace ns) {
setNamespace(elem, ns);
setNamespaces(elem.content(), ns);
}
/**
*Recursively sets the namespace of the List and all children if the
current namespace is match
*/
public static void setNamespaces(List l, Namespace ns) {
Node n = null;
for (int i = 0; i < l.size(); i++) {
n = (Node) l.get(i);
if (n.getNodeType() == Node.ATTRIBUTE_NODE) {
((Attribute) n).setNamespace(ns);
}
if (n.getNodeType() == Node.ELEMENT_NODE) {
setNamespaces((Element) n, ns);
}
}
}
Hope this is useful for someone who needs it!
希望这对需要它的人有用!
回答by Jherico
Option 1 is dangerous because you can't guarantee the prefixes for a given namespace without pre-parsing the document, and because you can end up with namespace collision. If you're consuming a document and not outputting anything, it might be ok, depending on the source of the doc, but otherwise it just loses too much information.
选项 1 很危险,因为您无法在不预先解析文档的情况下保证给定命名空间的前缀,并且最终可能会发生命名空间冲突。如果您正在使用文档而不输出任何内容,这可能没问题,具体取决于文档的来源,否则它只会丢失太多信息。
Option 2 could be applied recursively but its got many of the same problems as option 1.
选项 2 可以递归应用,但它有许多与选项 1 相同的问题。
Option 3 sounds like the best approach, but rather than clutter your code, make a static method that does both checks rather than putting the same if statement throughout your codebase.
选项 3 听起来是最好的方法,但与其使代码混乱,不如创建一个静态方法来执行这两项检查,而不是在整个代码库中放置相同的 if 语句。
The best approach is to get whoever is sending you the bad XML to fix it. Of course this begs the question is it actually broken. Specifically, are you getting XML where the default namespace is defined as X and then a namespace also representing X is given a prefix of 'es'? If this is the case then the XML is well formed and you just need code that is agnostic about the prefix, but still uses a qualified name to fetch the element. I'm not familiar enough with Dom4j to know if creating a Namespace with a null prefix will cause it to match all elements with a matching URI or only those with no prefix, but its worth experimenting with.
最好的方法是让向您发送错误 XML 的人修复它。当然,这引出了一个问题,它实际上是坏的。具体来说,您是否获得了 XML,其中默认命名空间定义为 X,然后同样代表 X 的命名空间被赋予了前缀“es”?如果是这种情况,则 XML 格式良好,您只需要与前缀无关的代码,但仍使用限定名称来获取元素。我对 Dom4j 不够熟悉,不知道创建带有空前缀的命名空间是否会导致它匹配具有匹配 URI 的所有元素或仅匹配那些没有前缀的元素,但它值得尝试。
回答by vdr
As Abhishek, I needed to strip the namespace from XML to simplify XPath queries in system testing scripts. (the XML is first XSD validated)
作为 Abhishek,我需要从 XML 中剥离命名空间以简化系统测试脚本中的 XPath 查询。(XML 首先经过 XSD 验证)
Here are the problems I faced:
以下是我遇到的问题:
- I needed to process deeply structured XML that had a tendency of blowing up the stack.
- On most complex XML, for a reason I didn't investigate fully, stripping all the namespaces only worked in reliably when traversing the DOM tree depth first. So that excluded the visitor, or getting the list of nodes with
document.selectNodes("//*")
- 我需要处理深度结构化的 XML,而这些 XML 有炸毁堆栈的倾向。
- 在大多数复杂的 XML 上,由于我没有完全调查的原因,剥离所有命名空间只有在首先遍历 DOM 树深度时才能可靠地工作。这样就排除了访问者,或者使用以下方法获取节点列表
document.selectNodes("//*")
I ended up with the following (not the most elegant, but if that can help solving somebody's problem ...):
我最终得到了以下内容(不是最优雅的,但如果这可以帮助解决某人的问题......):
public static String normaliseXml(final String message) {
org.dom4j.Document document;
document = DocumentHelper.parseText(message);
Queue stack = new LinkedList();
Object current = document.getRootElement();
while (current != null) {
if (current instanceof Element) {
Element element = (Element) current;
Iterator iterator = element.elementIterator();
if (iterator.hasNext()) {
stack.offer(element);
current = iterator;
} else {
stripNamespace(element);
current = stack.poll();
}
} else {
Iterator iterator = (Iterator) current;
if (iterator.hasNext()) {
stack.offer(iterator);
current = iterator.next();
} else {
current = stack.poll();
if (current instanceof Element) {
stripNamespace((Element) current);
current = stack.poll();
}
}
}
}
return document.asXML();
}
private static void stripNamespace(Element element) {
QName name = new QName(element.getName(), Namespace.NO_NAMESPACE, element.getName());
element.setQName(name);
for (Object o : element.attributes()) {
Attribute attribute = (Attribute) o;
QName attributeName = new QName(attribute.getName(), Namespace.NO_NAMESPACE, attribute.getName());
String attributeValue = attribute.getValue();
element.remove(attribute);
element.addAttribute(attributeName, attributeValue);
}
for (Object o : element.declaredNamespaces()) {
Namespace namespace = (Namespace) o;
element.remove(namespace);
}
}
回答by user2368526
This code actually works:
这段代码实际上有效:
public void visit(Document document) {
((DefaultElement) document.getRootElement())
.setNamespace(Namespace.NO_NAMESPACE);
document.getRootElement().additionalNamespaces().clear();
}
public void visit(Namespace namespace) {
if (namespace.getParent() != null) {
namespace.getParent().remove(namespace);
}
}
public void visit(Attribute node) {
if (node.toString().contains("xmlns")
|| node.toString().contains("xsi:")) {
node.getParent().remove(node);
}
}
public void visit(Element node) {
if (node instanceof DefaultElement) {
((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
node.additionalNamespaces().clear();
}
}

