java 8 中的漂亮打印 XML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25864316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pretty print XML in java 8
提问by Hungry
I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me.I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found
error.
我有一个存储为 DOM 文档的 XML 文件,我想将它打印到控制台,最好不使用外部库。我知道这个问题在这个网站上被问过多次,但是以前的答案都没有对我有用。我使用的是 java 8,所以这可能是我的代码与以前的问题不同的地方?我还尝试使用从网上找到的代码手动设置变压器,但这只会导致not found
错误。
Here is my code which currently just outputs each xml element on a new line to the left of the console.
这是我的代码,目前只在控制台左侧的新行上输出每个 xml 元素。
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Test {
public Test(){
try {
//java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");
DocumentBuilderFactory dbFactory;
DocumentBuilder dBuilder;
Document original = null;
try {
dbFactory = DocumentBuilderFactory.newInstance();
dBuilder = dbFactory.newDocumentBuilder();
original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
} catch (SAXException | IOException | ParserConfigurationException e) {
e.printStackTrace();
}
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory tf = TransformerFactory.newInstance();
//tf.setAttribute("indent-number", 2);
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(original), xmlOutput);
java.lang.System.out.println(xmlOutput.getWriter().toString());
} catch (Exception ex) {
throw new RuntimeException("Error converting to String", ex);
}
}
public static void main(String[] args){
new Test();
}
}
采纳答案by Aldo
I guess that the problem is related to blank text nodes(i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer
is going to preserve them.
我猜这个问题与原始文件中的空白文本节点(即只有空格的文本节点)有关。您应该尝试使用以下代码在解析后以编程方式删除它们。如果您不删除它们,Transformer
它将保留它们。
original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);
for (int i = 0; i < blankTextNodes.getLength(); i++) {
blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
回答by iCrazybest
Create xml file :
创建 xml 文件:
new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect !
so that, when parse the content of the given input source as an XML document and return a new DOM object.
这样,当将给定输入源的内容解析为 XML 文档并返回一个新的 DOM 对象时。
Document original = null;
...
original.parse("data.xml");//input source as an XML document
回答by Tom
This works on Java 8:
这适用于 Java 8:
public static void main (String[] args) throws Exception {
String xmlString = "<hello><from>ME</from></hello>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
pretty(document, System.out, 2);
}
private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
if (indent > 0) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
}
Result result = new StreamResult(outputStream);
Source source = new DOMSource(document);
transformer.transform(source, result);
}
回答by ThomasRS
I've written a simple classfor for removing whitespace in documents - supports command-line and does not use DOM / XPath.
我编写了一个简单的类来删除文档中的空格 - 支持命令行并且不使用 DOM / XPath。
Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:
编辑:想想看,该项目还包含一个处理现有空白的漂亮打印机:
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();
回答by Stephan
In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".
在回复 Espinosa 的评论时,这里是“原始 xml 尚未(部分)缩进或包含新行”时的解决方案。
Background
背景
Excerpt from the article (see Referencesbelow) inspiring this solution:
启发此解决方案的文章摘录(请参阅下面的参考资料):
Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath's normalize-space to locate all the whitespace nodes and remove them first.
根据 DOM 规范,标签外的空格是完全有效的,并且会被正确保留。要删除它们,我们可以使用 XPath 的 normalize-space 来定位所有空白节点并首先删除它们。
Java Code
Java代码
public static String toPrettyString(String xml, int indent) {
try {
// Turn xml string into a document
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
// Remove whitespaces outside tags
document.normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
// Setup pretty print options
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
// Return pretty print xml string
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
return stringWriter.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Sample usage
示例用法
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(toPrettyString(xml, 4));
Output
输出
<root>
<name>Coco Puff</name>
<total>10</total>
</root>
References
参考
- Java: Properly Indenting XML Stringpublished on MyShittyCode
- Save new XML node to file
- Java:正确缩进在MyShittyCode 上发布的XML 字符串
- 将新的 XML 节点保存到文件
回答by Andrew
I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:
我不喜欢任何常见的 XML 格式化解决方案,因为它们都删除了超过 1 个连续的换行符(出于某种原因,删除空格/制表符和删除换行符是不可分割的......)。这是我的解决方案,它实际上是为 XHTML 制作的,但也应该用 XML 来完成这项工作:
public String GenerateTabs(int tabLevel) {
char[] tabs = new char[tabLevel * 2];
Arrays.fill(tabs, ' ');
//Or:
//char[] tabs = new char[tabLevel];
//Arrays.fill(tabs, '\t');
return new String(tabs);
}
public String FormatXHTMLCode(String code) {
// Split on new lines.
String[] splitLines = code.split("\n", 0);
int tabLevel = 0;
// Go through each line.
for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
String currentLine = splitLines[lineNum];
if (currentLine.trim().isEmpty()) {
splitLines[lineNum] = "";
} else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
++tabLevel;
} else if (currentLine.matches(".*</[^<>]+?>")) {
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
} else if (currentLine.matches("[^<>]*?/>")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
} else {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
}
}
return String.join("\n", splitLines);
}
It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.
它做出一个假设:除了那些构成 XML/XHTML 标签的字符之外,没有 <> 字符。
回答by Valentyn Kolesnikov
Underscore-javahas static method U.formatXml(string)
. I am the maintainer of the project. Live example
Underscore-java有静态方法U.formatXml(string)
。我是项目的维护者。活生生的例子
import com.github.underscore.lodash.U;
public class MyClass {
public static void main(String args[]) {
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(U.formatXml(xml));
}
}
Output:
输出:
<root>
<name>Coco Puff</name>
<total>10</total>
</root>