在 Java DOM 中以字符串的形式获取节点的内部 XML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3300839/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 21:53:30  来源:igfitidea点击:

Get a node's inner XML as String in Java DOM

javaxmldom

提问by Marjan

I have an XML org.w3c.dom.Node that looks like this:

我有一个像这样的 XML org.w3c.dom.Node:

<variable name="variableName">
    <br /><strong>foo</strong> bar
</variable>

How do I get the <br /><strong>foo</strong> barpart as a String?

如何将<br /><strong>foo</strong> bar零件作为字符串获取?

采纳答案by Andrey M.

Same problem. To solve it I wrote this helper function:

同样的问题。为了解决这个问题,我写了这个辅助函数:

public String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}

回答by Robert Diana

There is no simple method on org.w3c.dom.Nodefor this. getTextContent()gives the text of each child node concatenated together. getNodeValue()will give you the text of the current node if it is an Attribute,CDATAor Textnode. So you would need to serialize the node using a combination of getChildNodes(), getNodeName()and getNodeValue()to build the string.

对此没有简单的方法org.w3c.dom.NodegetTextContent()给出连接在一起的每个子节点的文本。getNodeValue()如果当前节点是Attribute,CDATAText节点,则会为您提供当前节点的文本。所以,你需要使用序列化的组合节点getChildNodes()getNodeName()getNodeValue()建立字符串。

You can also do it with one of the various XML serialization libraries that exist. There is XStreamor even JAXB. This is discussed here: XML serialization in Java?

您还可以使用现有的各种 XML 序列化库之一来完成。有XStream甚至是 JAXB。这在这里讨论:Java 中的 XML 序列化?

回答by AgentKnopf

If you dont want to resort to external libraries, the following solution might come in handy. If you have a node <parent><child name="Nina"/></parent>and you want to extract the children of the parent element proceed as follows:

如果您不想求助于外部库,以下解决方案可能会派上用场。如果您有一个节点<parent><child name="Nina"/></parent>并且想要提取父元素的子元素,请执行以下操作:

    StringBuilder resultBuilder = new StringBuilder();
    // Get all children of the given parent node
    NodeList children = parent.getChildNodes();
    try {

        // Set up the output transformer
        TransformerFactory transfac = TransformerFactory.newInstance();
        Transformer trans = transfac.newTransformer();
        trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        trans.setOutputProperty(OutputKeys.INDENT, "yes");
        StringWriter stringWriter = new StringWriter();
        StreamResult streamResult = new StreamResult(stringWriter);

        for (int index = 0; index < children.getLength(); index++) {
            Node child = children.item(index);

            // Print the DOM node
            DOMSource source = new DOMSource(child);
            trans.transform(source, streamResult);
            // Append child to end result
            resultBuilder.append(stringWriter.toString());
        }
    } catch (TransformerException e) {
        //Error handling goes here
    }
    return resultBuilder.toString();

回答by Lukas Eder

If you're using jOOX, you can wrap your node in a jquery-like syntax and just call toString()on it:

如果您使用jOOX,您可以将您的节点包装在类似jquery的语法中,然后调用toString()它:

$(node).toString();

It uses an identity-transformer internally, like this:

它在内部使用身份转换器,如下所示:

ByteArrayOutputStream out = new ByteArrayOutputStream();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
Source source = new DOMSource(element);
Result target = new StreamResult(out);
transformer.transform(source, target);
return out.toString();

回答by Jeevan

Building on top of Lukas Eder's solution, we can extract innerXml like in .NET as below

基于 Lukas Eder 的解决方案,我们可以像在 .NET 中一样提取innerXml,如下所示

    public static String innerXml(Node node,String tag){
            String xmlstring = toString(node);
            xmlstring = xmlstring.replaceFirst("<[/]?"+tag+">","");
            return xmlstring;       
}

public static String toString(Node node){       
    String xmlString = "";
    Transformer transformer;
    try {
        transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        StreamResult result = new StreamResult(new StringWriter());

        xmlString = nodeToStream(node, transformer, result);

    } catch (TransformerConfigurationException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerFactoryConfigurationError e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }catch (Exception ex){
        ex.printStackTrace();
    }

    return xmlString;               
}

Ex:

前任:

If Node name points to xml with string representation "<Name><em>Chris</em>tian<em>Bale</em></Name>" 
String innerXml = innerXml(name,"Name"); //returns "<em>Chris</em>tian<em>Bale</em>"

回答by MatEngel

I had the problem with the last answer that method 'nodeToStream()' is undefined; therefore, my version here:

我在最后一个答案中遇到了问题,即方法 'nodeToStream()' 未定义;因此,我的版本在这里:

    public static String toString(Node node){
    String xmlString = "";
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Source source = new DOMSource(node);

        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        transformer.transform(source, result);
        xmlString = sw.toString ();

    } catch (Exception ex) {
        ex.printStackTrace ();
    }

    return xmlString;
}

回答by Alan

Extending on Andrey M's answer, I had to slightly modify the code to get the complete DOM document. If you just use the

扩展安德烈 M 的回答,我不得不稍微修改代码以获得完整的 DOM 文档。如果你只是使用

 NodeList childNodes = node.getChildNodes();

It didn't include the root element for me. To include the root element (and get the complete .xml document) I used:

它不包括我的根元素。要包含根元素(并获取完整的 .xml 文档),我使用了:

 public String innerXml(Node node) {
     DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
     LSSerializer lsSerializer = lsImpl.createLSSerializer();
     lsSerializer.getDomConfig().setParameter("xml-declaration", false);
     StringBuilder sb = new StringBuilder();
     sb.append(lsSerializer.writeToString(node));
     return sb.toString(); 
 }

回答by Ralph

Here is an alternative solution to extract the content of a org.w3c.dom.Node. This solution works also if the node content contains no xml tags:

这是提取 org.w3c.dom.Node 内容的替代解决方案。如果节点内容不包含 xml 标签,此解决方案也适用:

private static String innerXml(Node node) throws TransformerFactoryConfigurationError, TransformerException {
    StringWriter writer = new StringWriter();
    String xml = null;
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.transform(new DOMSource(node), new StreamResult(writer));
    // now remove the outer tag....
    xml = writer.toString();
    xml = xml.substring(xml.indexOf(">") + 1, xml.lastIndexOf("</"));
    return xml;
}

回答by Ondra ?i?ka

The best solution so far, Andrey M's, needs a specific implementation which can cause issues in the future. Here is the same approach but with just whatever JDK gives you to do the serialization (that means, what is configured to be used).

迄今为止最好的解决方案,Andrey M 的,需要一个特定的实现,这可能会在未来引起问题。这是相同的方法,但仅使用 JDK 为您提供的序列化(即配置为使用的内容)。

public static String innerXml(Node node) throws Exception
{
        StringWriter writer = new StringWriter();
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        NodeList childNodes = node.getFirstChild().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
            transformer.transform(new DOMSource(childNodes.item(i)), new StreamResult(writer));
        }
        return writer.toString();
}

If you're processing a documentrather than a node, you must go one level deep and use node.getFirstChild().getChildNodes();But,to make it more robust, you should find the first Element, not just take it for granted that there is only one node. XML has to have a single root element, but can multiple nodes, including comments, entities and whitespace text.

如果您正在处理文档而不是节点,则必须深入一层并使用node.getFirstChild().getChildNodes();But,为了使其更健壮,您应该找到第一个 Element,而不是理所当然地认为只有一个节点。XML 必须有一个根元素,但可以有多个节点,包括注释、实体和空白文本。

        Node rootElement = docRootNode.getFirstChild();
        while (rootElement != null && rootElement.getNodeType() != Node.ELEMENT_NODE)
            rootElement = rootElement.getNextSibling();
        if (rootElement == null)
            throw new RuntimeException("No root element found in given document node.");

        NodeList childNodes = rootElement.getChildNodes();


And if I should recommend a library to deal with it, try JSoup, which is primarily for HTML, but works with XML too. I haven't tested that though.

如果我应该推荐一个库来处理它,请尝试 JSoup,它主要用于 HTML,但也适用于 XML。不过我还没有测试过。

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
fileContents.put(Attributes.BODY, document.body().html());
// versus: document.body().outerHtml()

回答by Ralph

I want to extend the very good answer from Andrey M.:

我想扩展安德烈 M. 的非常好的答案:

It can happen that a node is not serializeable and this results in the following exception on some implementations:

可能会发生节点不可序列化的情况,这会导致在某些实现中出现以下异常:

org.w3c.dom.ls.LSException: unable-to-serialize-node: 
            unable-to-serialize-node: The node could not be serialized.

I had this issue with the implementation "org.apache.xml.serialize.DOMSerializerImpl.writeToString(DOMSerializerImpl)" running on Wildfly 13.

我在 Wildfly 13 上运行的实现“ org.apache.xml.serialize.DOMSerializerImpl.writeToString(DOMSerializerImpl)”遇到了这个问题。

To solve this issue I would suggest to change the code example from Andrey M. a little bit:

为了解决这个问题,我建议稍微更改 Andrey M. 的代码示例:

private static String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS) node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false); 
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
        Node innerNode = childNodes.item(i);
        if (innerNode!=null) {
            if (innerNode.hasChildNodes()) {
                sb.append(lsSerializer.writeToString(innerNode));
            } else {
                sb.append(innerNode.getNodeValue());
            }
        }
    }
    return sb.toString();
}

I also added the comment from Nyerguds. This works for me in wildfly 13.

我还添加了 Nyerguds 的评论。这在wildfly 13中对我有用。