Java 如何在序列化之前从 DOM 中去除纯空白文本节点？

Question

提问by Marc Novakowski

I have some Java (5.0) code that constructs a DOM from various (cached) data sources, then removes certain element nodes that are not required, then serializes the result into an XML string using:

我有一些 Java (5.0) 代码从各种（缓存的）数据源构造 DOM，然后删除某些不需要的元素节点，然后使用以下方法将结果序列化为 XML 字符串：

// Serialize DOM back into a string
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
return out.toString();

However, since I'm removing several element nodes, I end up with a lot of extra whitespace in the final serialized document.

但是，由于我要删除多个元素节点，因此最终序列化文档中会出现很多额外的空格。

Is there a simple way to remove/collapse the extraneous whitespace from the DOM before (or while) it's serialized into a String?

是否有一种简单的方法可以在将 DOM 序列化为字符串之前（或同时）从 DOM 中删除/折叠无关的空格？

Answer 1

采纳答案by James Murty

You can find empty text nodes using XPath, then remove them programmatically like so:

您可以使用 XPath 找到空文本节点，然后像这样以编程方式删除它们：

XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
        "//text()[normalize-space(.) = '']");  
NodeList emptyTextNodes = (NodeList) 
        xpathExp.evaluate(doc, XPathConstants.NODESET);

// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
    Node emptyTextNode = emptyTextNodes.item(i);
    emptyTextNode.getParentNode().removeChild(emptyTextNode);
}

This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.

如果您想要比 XSL 模板更容易实现的节点删除控制，则此方法可能很有用。

Answer 2

回答by objects

Try using the following XSL and the strip-spaceelement to serialize your DOM:

尝试使用以下 XSL 和strip-space元素来序列化您的 DOM：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

http://helpdesk.objects.com.au/java/how-do-i-remove-whitespace-from-an-xml-document

Answer 3

回答by Swapna Kasula

transformer.setOutputProperty(OutputKeys.INDENT, "yes");

This will retain xml indentation.

这将保留 xml 缩进。

Answer 4

回答by Venkata Raju

Below code deletes the comment nodes and text nodes with all empty spaces. If the text node has some value, value will be trimmed

下面的代码删除所有空格的注释节点和文本节点。如果文本节点有一些值，值将被修剪

public static void clean(Node node)
{
  NodeList childNodes = node.getChildNodes();

  for (int n = childNodes.getLength() - 1; n >= 0; n--)
  {
     Node child = childNodes.item(n);
     short nodeType = child.getNodeType();

     if (nodeType == Node.ELEMENT_NODE)
        clean(child);
     else if (nodeType == Node.TEXT_NODE)
     {
        String trimmedNodeVal = child.getNodeValue().trim();
        if (trimmedNodeVal.length() == 0)
           node.removeChild(child);
        else
           child.setNodeValue(trimmedNodeVal);
     }
     else if (nodeType == Node.COMMENT_NODE)
        node.removeChild(child);
  }
}

Ref: http://www.sitepoint.com/removing-useless-nodes-from-the-dom/

参考：http: //www.sitepoint.com/removing-useless-nodes-from-the-dom/

Answer 5

回答by pimlottc

Another possible approach is to remove neighboring whitespace at the same time as you're removing the target nodes:

另一种可能的方法是在删除目标节点的同时删除相邻的空格：

private void removeNodeAndTrailingWhitespace(Node node) {
    List<Node> exiles = new ArrayList<Node>();

    exiles.add(node);
    for (Node whitespace = node.getNextSibling();
            whitespace != null && whitespace.getNodeType() == Node.TEXT_NODE && whitespace.getTextContent().matches("\s*");
            whitespace = whitespace.getNextSibling()) {
        exiles.add(whitespace);
    }

    for (Node exile: exiles) {
        exile.getParentNode().removeChild(exile);
    }
}

This has the benefit of keeping the rest of the existing formatting intact.

这有利于保持现有格式的其余部分完好无损。

Answer 6

回答by user6615071

The following code works:

以下代码有效：

public String getSoapXmlFormatted(String pXml) {
    try {
        if (pXml != null) {
            DocumentBuilderFactory tDbFactory = DocumentBuilderFactory
                    .newInstance();
            DocumentBuilder tDBuilder;
            tDBuilder = tDbFactory.newDocumentBuilder();
            Document tDoc = tDBuilder.parse(new InputSource(
                    new StringReader(pXml)));
            removeWhitespaces(tDoc);
            final DOMImplementationRegistry tRegistry = DOMImplementationRegistry
                    .newInstance();
            final DOMImplementationLS tImpl = (DOMImplementationLS) tRegistry
                    .getDOMImplementation("LS");
            final LSSerializer tWriter = tImpl.createLSSerializer();
            tWriter.getDomConfig().setParameter("format-pretty-print",
                    Boolean.FALSE);
            tWriter.getDomConfig().setParameter(
                    "element-content-whitespace", Boolean.TRUE);
            pXml = tWriter.writeToString(tDoc);
        }
    } catch (RuntimeException | ParserConfigurationException | SAXException
            | IOException | ClassNotFoundException | InstantiationException
            | IllegalAccessException tE) {
        tE.printStackTrace();
    }
    return pXml;
}

public void removeWhitespaces(Node pRootNode) {
    if (pRootNode != null) {
        NodeList tList = pRootNode.getChildNodes();
        if (tList != null && tList.getLength() > 0) {
            ArrayList<Node> tRemoveNodeList = new ArrayList<Node>();
            for (int i = 0; i < tList.getLength(); i++) {
                Node tChildNode = tList.item(i);
                if (tChildNode.getNodeType() == Node.TEXT_NODE) {
                    if (tChildNode.getTextContent() == null
                            || "".equals(tChildNode.getTextContent().trim()))
                        tRemoveNodeList.add(tChildNode);
                } else
                    removeWhitespaces(tChildNode);
            }
            for (Node tRemoveNode : tRemoveNodeList) {
                pRootNode.removeChild(tRemoveNode);
            }
        }
    }
}

Java 如何在序列化之前从 DOM 中去除纯空白文本节点？

提问by Marc Novakowski

采纳答案by James Murty

回答by objects

回答by Swapna Kasula

回答by Venkata Raju

回答by pimlottc

回答by user6615071

相关推荐

最近更新

标签

Java 如何在序列化之前从 DOM 中去除纯空白文本节点？

提问by Marc Novakowski

采纳答案by James Murty

回答by objects

回答by Swapna Kasula

回答by Venkata Raju

回答by pimlottc

回答by user6615071

相关推荐

Java 从jsf重定向？

我想用java写一个prim的算法

Java 是 100% 面向对象的吗？

在javascript中读取会话变量？

相关推荐

最近更新

标签