Java/DOM：获取节点的 XML 内容

Question

提问by

I am parsing a XML file in Java using the W3C DOM. I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.

我正在使用 W3C DOM 解析 Java 中的 XML 文件。我被困在一个特定的问题上，我不知道如何获取节点的整个内部 XML。

The node looks like that:

该节点如下所示：

<td><b>this</b> is a <b>test</b></td>

What function do I have to use to get that:

我必须使用什么功能来获得它：

"<b>this</b> is a <b>test</b>"

Answer 1

采纳答案by Pierre

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter()); . Seehow-to-pretty-print-xml-from-java

您必须使用 <b> 节点作为要转换的节点来使用转换/xslt API，并将结果放入新的 StreamResult(new StringWriter()); . 请参阅how-to-pretty-print-xml-from-java

Answer 2

回答by Joel P.

I know this was asked long ago but for the next person searching (was me today), this works with JDOM:

我知道很久以前就有人问过这个问题，但是对于下一个搜索的人（今天是我），这适用于 JDOM：

JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));

This passes a list of all child nodes into outputString, which will serialize them out in order.

这会将所有子节点的列表传递到 outputString 中，这将按顺序将它们序列化。

Answer 3

回答by Kry?tof Hilar

What do you say about this ? I had same problem today on android, but i managed to make simple "serializator"

你对此有什么看法？我今天在 android 上遇到了同样的问题，但我设法制作了简单的“序列化器”

private String innerXml(Node node){
        String s = "";
        NodeList childs = node.getChildNodes();
        for( int i = 0;i<childs.getLength();i++ ){
            s+= serializeNode(childs.item(i));
        }
        return s;
    }

    private String serializeNode(Node node){
        String s = "";
        if( node.getNodeName().equals("#text") ) return node.getTextContent();
        s+= "<" + node.getNodeName()+" ";
        NamedNodeMap attributes = node.getAttributes();
        if( attributes!= null ){
            for( int i = 0;i<attributes.getLength();i++ ){
                s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
            }
        }
        NodeList childs = node.getChildNodes();
        if( childs == null || childs.getLength() == 0 ){
            s+= "/>";
            return s;
        }
        s+=">";
        for( int i = 0;i<childs.getLength();i++ )
            s+=serializeNode(childs.item(i));
        s+= "</"+node.getNodeName()+">";
        return s;
    }

Answer 4

回答by Jason S

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.

呃...你也可以调用 toString() 并手动或使用正则表达式去掉开始和结束标签。

edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML booktalks about the Load and Save module of Java DOM.

编辑： toString() 不符合我的预期。翻出 O'Reilly Java & XML 一书，讨论了 Java DOM 的加载和保存模块。

See in particular the LSSerializerwhich looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilterto not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)

特别是LSSerializer，它看起来非常有前途。您可以按照我的建议调用 writeToString(node) 并切掉开始和结束标签，或者尝试使用LSSerializerFilter来不打印顶部节点标签（不确定这是否可行；我承认我以前从未使用过 LSSerializer .)

Reading the O'Reilly book seems to indicate doing something like this:

阅读 O'Reilly 的书似乎表明做这样的事情：

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 DOMImplementationLS lsImpl = 
   (DOMImplementationLS)registry.getDOMImplementation("LS");
 LSSerializer serializer = lsImpl.createLSSerializer();
 String nodeString = serializer.writeToString(node);

Answer 5

回答by Jason S

node.getTextContent();

You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.

如果没有其他原因，您应该使用 Dom4J 的 JDom 来处理节点，以正确处理空格。

Answer 6

回答by javapowered

To remove unneccesary tags probably such code can be used:

要删除不必要的标签，可能可以使用这样的代码：

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("canonical-form", true);

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("规范形式", true);

But it will not always work, because "canonical-form=true" is optional

但它并不总是有效，因为“canonical-form=true”是可选的

Java/DOM：获取节点的 XML 内容

提问by

采纳答案by Pierre

回答by Joel P.

回答by Kry?tof Hilar

回答by Jason S

回答by Jason S

回答by javapowered

相关推荐

最近更新

标签

Java/DOM：获取节点的 XML 内容

提问by

采纳答案by Pierre

回答by Joel P.

回答by Kry?tof Hilar

回答by Jason S

回答by Jason S

回答by javapowered

相关推荐

java 如何访问 Velocity 模板中的静态成员？

java 使用 SL4J、Jakarta Commons 日志、第三方库的 log4j 和我自己的代码进行日志记录

java Java中静态块的线程安全

java java问题中的正则表达式，多个匹配

相关推荐

最近更新

标签