java 8 中的漂亮打印 XML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25864316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 01:21:19  来源:igfitidea点击:

Pretty print XML in java 8

javaxmldompretty-print

提问by Hungry

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me.I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not founderror.

我有一个存储为 DOM 文档的 XML 文件,我想将它打印到控制台,最好不使用外部库。我知道这个问题在这个网站上被问过多次,但是以前的答案都没有对我有用。我使用的是 java 8,所以这可能是我的代码与以前的问题不同的地方?我还尝试使用从网上找到的代码手动设置变压器,但这只会导致not found错误。

Here is my code which currently just outputs each xml element on a new line to the left of the console.

这是我的代码,目前只在控制台左侧的新行上输出每个 xml 元素。

import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class Test {
    public Test(){
        try {
            //java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");

            DocumentBuilderFactory dbFactory;
            DocumentBuilder dBuilder;
            Document original = null;
            try {
                dbFactory = DocumentBuilderFactory.newInstance();
                dBuilder = dbFactory.newDocumentBuilder();
                original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
            } catch (SAXException | IOException | ParserConfigurationException e) {
                e.printStackTrace();
            }
            StringWriter stringWriter = new StringWriter();
            StreamResult xmlOutput = new StreamResult(stringWriter);
            TransformerFactory tf = TransformerFactory.newInstance();
            //tf.setAttribute("indent-number", 2);
            Transformer transformer = tf.newTransformer();
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.transform(new DOMSource(original), xmlOutput);
            java.lang.System.out.println(xmlOutput.getWriter().toString());
        } catch (Exception ex) {
            throw new RuntimeException("Error converting to String", ex);
        }
    }

    public static void main(String[] args){
        new Test();
    }

}

采纳答案by Aldo

I guess that the problem is related to blank text nodes(i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformeris going to preserve them.

我猜这个问题与原始文件中的空白文本节点(即只有空格的文本节点)有关。您应该尝试使用以下代码在解析后以编程方式删除它们。如果您不删除它们,Transformer它将保留它们。

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}

回答by iCrazybest

Create xml file :

创建 xml 文件:

new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect ! 

so that, when parse the content of the given input source as an XML document and return a new DOM object.

这样,当将给定输入源的内容解析为 XML 文档并返回一个新的 DOM 对象时。

Document original = null;
...
original.parse("data.xml");//input source as an XML document

回答by Tom

This works on Java 8:

这适用于 Java 8:

public static void main (String[] args) throws Exception {
    String xmlString = "<hello><from>ME</from></hello>";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
    pretty(document, System.out, 2);
}

private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    if (indent > 0) {
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
    }
    Result result = new StreamResult(outputStream);
    Source source = new DOMSource(document);
    transformer.transform(source, result);
}

回答by ThomasRS

I've written a simple classfor for removing whitespace in documents - supports command-line and does not use DOM / XPath.

我编写了一个简单的类来删除文档中的空格 - 支持命令行并且不使用 DOM / XPath。

Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:

编辑:想想看,该项目还包含一个处理现有空白的漂亮打印机:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

回答by Stephan

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".

在回复 Espinosa 的评论时,这里是“原始 xml 尚未(部分)缩进或包含新行”时的解决方案。

Background

背景

Excerpt from the article (see Referencesbelow) inspiring this solution:

启发此解决方案的文章摘录(请参阅下面的参考资料):

Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath's normalize-space to locate all the whitespace nodes and remove them first.

根据 DOM 规范,标签外的空格是完全有效的,并且会被正确保留。要删除它们,我们可以使用 XPath 的 normalize-space 来定位所有空白节点并首先删除它们。

Java Code

Java代码

public static String toPrettyString(String xml, int indent) {
    try {
        // Turn xml string into a document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder()
                .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

        // Remove whitespaces outside tags
        document.normalize();
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                      document,
                                                      XPathConstants.NODESET);

        for (int i = 0; i < nodeList.getLength(); ++i) {
            Node node = nodeList.item(i);
            node.getParentNode().removeChild(node);
        }

        // Setup pretty print options
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        // Return pretty print xml string
        StringWriter stringWriter = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
        return stringWriter.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Sample usage

示例用法

String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

System.out.println(toPrettyString(xml, 4));

Output

输出

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>

References

参考

回答by Andrew

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:

我不喜欢任何常见的 XML 格式化解决方案,因为它们都删除了超过 1 个连续的换行符(出于某种原因,删除空格/制表符和删除换行符是不可分割的......)。这是我的解决方案,它实际上是为 XHTML 制作的,但也应该用 XML 来完成这项工作:

public String GenerateTabs(int tabLevel) {
  char[] tabs = new char[tabLevel * 2];
  Arrays.fill(tabs, ' ');

  //Or:
  //char[] tabs = new char[tabLevel];
  //Arrays.fill(tabs, '\t');

  return new String(tabs);
}

public String FormatXHTMLCode(String code) {
  // Split on new lines.
  String[] splitLines = code.split("\n", 0);

  int tabLevel = 0;

  // Go through each line.
  for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
    String currentLine = splitLines[lineNum];

    if (currentLine.trim().isEmpty()) {
      splitLines[lineNum] = "";
    } else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      ++tabLevel;
    } else if (currentLine.matches(".*</[^<>]+?>")) {
      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }

      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    } else if (currentLine.matches("[^<>]*?/>")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }
    } else {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    }
  }

  return String.join("\n", splitLines);
}

It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

它做出一个假设:除了那些构成 XML/XHTML 标签的字符之外,没有 <> 字符。

回答by Valentyn Kolesnikov

Underscore-javahas static method U.formatXml(string). I am the maintainer of the project. Live example

Underscore-java有静态方法U.formatXml(string)。我是项目的维护者。活生生的例子

import com.github.underscore.lodash.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

        System.out.println(U.formatXml(xml));
    }
}

Output:

输出:

<root>
   <name>Coco Puff</name>
   <total>10</total>
</root>