Java/DOM:获取节点的 XML 内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/484995/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:36:19  来源:igfitidea点击:

Java/DOM: Get the XML content of a node

javaxmldom

提问by

I am parsing a XML file in Java using the W3C DOM. I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.

我正在使用 W3C DOM 解析 Java 中的 XML 文件。我被困在一个特定的问题上,我不知道如何获取节点的整个内部 XML。

The node looks like that:

该节点如下所示:

<td><b>this</b> is a <b>test</b></td>

What function do I have to use to get that:

我必须使用什么功能来获得它:

"<b>this</b> is a <b>test</b>"

采纳答案by Pierre

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter()); . Seehow-to-pretty-print-xml-from-java

您必须使用 <b> 节点作为要转换的节点来使用转换/xslt API,并将结果放入新的 StreamResult(new StringWriter()); . 请参阅how-to-pretty-print-xml-from-java

回答by Joel P.

I know this was asked long ago but for the next person searching (was me today), this works with JDOM:

我知道很久以前就有人问过这个问题,但是对于下一个搜索的人(今天是我),这适用于 JDOM:

JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));

This passes a list of all child nodes into outputString, which will serialize them out in order.

这会将所有子节点的列表传递到 outputString 中,这将按顺序将它们序列化。

回答by Kry?tof Hilar

What do you say about this ? I had same problem today on android, but i managed to make simple "serializator"

你对此有什么看法?我今天在 android 上遇到了同样的问题,但我设法制作了简单的“序列化器”

private String innerXml(Node node){
        String s = "";
        NodeList childs = node.getChildNodes();
        for( int i = 0;i<childs.getLength();i++ ){
            s+= serializeNode(childs.item(i));
        }
        return s;
    }

    private String serializeNode(Node node){
        String s = "";
        if( node.getNodeName().equals("#text") ) return node.getTextContent();
        s+= "<" + node.getNodeName()+" ";
        NamedNodeMap attributes = node.getAttributes();
        if( attributes!= null ){
            for( int i = 0;i<attributes.getLength();i++ ){
                s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
            }
        }
        NodeList childs = node.getChildNodes();
        if( childs == null || childs.getLength() == 0 ){
            s+= "/>";
            return s;
        }
        s+=">";
        for( int i = 0;i<childs.getLength();i++ )
            s+=serializeNode(childs.item(i));
        s+= "</"+node.getNodeName()+">";
        return s;
    }

回答by Jason S

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.

呃...你也可以调用 toString() 并手动或使用正则表达式去掉开始和结束标签。

edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML booktalks about the Load and Save module of Java DOM.

编辑: toString() 不符合我的预期。翻出 O'Reilly Java & XML 一书,讨论了 Java DOM 的加载和保存模块。

See in particular the LSSerializerwhich looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilterto not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)

特别是LSSerializer,它看起来非常有前途。您可以按照我的建议调用 writeToString(node) 并切掉开始和结束标签,或者尝试使用LSSerializerFilter来不打印顶部节点标签(不确定这是否可行;我承认我以前从未使用过 LSSerializer .)

Reading the O'Reilly book seems to indicate doing something like this:

阅读 O'Reilly 的书似乎表明做这样的事情:

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
 DOMImplementationLS lsImpl = 
   (DOMImplementationLS)registry.getDOMImplementation("LS");
 LSSerializer serializer = lsImpl.createLSSerializer();
 String nodeString = serializer.writeToString(node);

回答by Jason S

node.getTextContent();

node.getTextContent();

You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.

如果没有其他原因,您应该使用 Dom4J 的 JDom 来处理节点,以正确处理空格。

回答by javapowered

To remove unneccesary tags probably such code can be used:

要删除不必要的标签,可能可以使用这样的代码:

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("canonical-form", true);

DOMConfiguration config = serializer.getDomConfig(); config.setParameter("规范形式", true);

But it will not always work, because "canonical-form=true" is optional

但它并不总是有效,因为“canonical-form=true”是可选的