如何从Java漂亮地打印XML?
我有一个包含XML的Java字符串,没有换行或者缩进。我想将其转换为格式正确的XML字符串。我该怎么做呢?
String unformattedXml = "<tag><nested>hello</nested></tag>"; String formattedXml = new [UnknownClass]().format(unformattedXml);
注意:我的输入是一个字符串。我的输出是一个字符串。
解决方案
Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); //initialize StreamResult with File object to save to file StreamResult result = new StreamResult(new StringWriter()); DOMSource source = new DOMSource(doc); transformer.transform(source, result); String xmlString = result.getWriter().toString(); System.out.println(xmlString);
注意:结果可能因Java版本而异。搜索特定于平台的解决方法。
由于我们是以String
开头的,因此我们需要先隐蔽到DOM
对象(例如Node),然后才能使用Transformer
。但是,如果我们知道XML字符串有效,并且不想增加将字符串解析为DOM的内存开销,然后对DOM进行转换以获取字符串,则可以做一些老式的字符通过字符解析。在每个</ ...>字符后插入一个换行符和空格,保持并缩进计数器(确定空格数),我们对每个
<...>递增,对每个
</。递减。 。>`你知道的。
免责声明我对以下函数进行了剪切/粘贴/文本编辑,因此它们可能无法按原样编译。
public static final Element createDOM(String strXML) throws ParserConfigurationException, SAXException, IOException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(true); DocumentBuilder db = dbf.newDocumentBuilder(); InputSource sourceXML = new InputSource(new StringReader(strXML)) Document xmlDoc = db.parse(sourceXML); Element e = xmlDoc.getDocumentElement(); e.normalize(); return e; } public static final void prettyPrint(Node xml, OutputStream out) throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException { Transformer tf = TransformerFactory.newInstance().newTransformer(); tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "yes"); tf.transform(new DOMSource(xml), new StreamResult(out)); }
这是我自己的问题的答案。我结合了各种结果的答案,编写了一个漂亮地打印XML的类。
无法保证它如何响应无效的XML或者大型文档。
package ecb.sdw.pretty; import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.XMLSerializer; import org.w3c.dom.Document; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import java.io.IOException; import java.io.StringReader; import java.io.StringWriter; import java.io.Writer; /** * Pretty-prints xml, supplied as a string. * <p/> * eg. * <code> * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>"); * </code> */ public class XmlFormatter { public XmlFormatter() { } public String format(String unformattedXml) { try { final Document document = parseXmlFile(unformattedXml); OutputFormat format = new OutputFormat(document); format.setLineWidth(65); format.setIndenting(true); format.setIndent(2); Writer out = new StringWriter(); XMLSerializer serializer = new XMLSerializer(out, format); serializer.serialize(document); return out.toString(); } catch (IOException e) { throw new RuntimeException(e); } } private Document parseXmlFile(String in) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); InputSource is = new InputSource(new StringReader(in)); return db.parse(is); } catch (ParserConfigurationException e) { throw new RuntimeException(e); } catch (SAXException e) { throw new RuntimeException(e); } catch (IOException e) { throw new RuntimeException(e); } } public static void main(String[] args) { String unformattedXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" + " xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" + " xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" + " <Query>\n" + " <query:CategorySchemeWhere>\n" + " \t\t\t\t\t <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" + " </query:CategorySchemeWhere>\n" + " </Query>\n\n\n\n\n" + "</QueryMessage>"; System.out.println(new XmlFormatter().format(unformattedXml)); } }
有一个非常不错的命令行xml实用程序,称为xmlstarlet(http://xmlstar.sourceforge.net/),它可以完成很多人使用的事情。
我们可以使用Runtime.exec以编程方式执行该程序,然后读入格式化的输出文件。它比其他几行Java代码可以提供更多的选择和更好的错误报告。
下载xmlstarlet:http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589
我过去使用org.dom4j.io.OutputFormat.createPrettyPrint()方法进行了漂亮的打印
public String prettyPrint(final String xml){ if (StringUtils.isBlank(xml)) { throw new RuntimeException("xml was null or blank in prettyPrint()"); } final StringWriter sw; try { final OutputFormat format = OutputFormat.createPrettyPrint(); final org.dom4j.Document document = DocumentHelper.parseText(xml); sw = new StringWriter(); final XMLWriter writer = new XMLWriter(sw, format); writer.write(document); } catch (Exception e) { throw new RuntimeException("Error pretty printing xml:\n" + xml, e); } return sw.toString(); }
关于"我们必须首先构建DOM树"的评论:不,我们不需要,也不应这样做。
相反,创建一个StreamSource(新的StreamSource(new StringReader(str)),并将其提供给所提到的身份转换器。这将使用SAX解析器,结果将更快。
在这种情况下,构建中间树纯属开销。
否则,排名最高的答案是好的。
如果可以使用第三方XML库,那么可以比现在投票最高的答案所建议的简单得多。
有人说输入和输出都应该是字符串,因此这是一个使用XOM库实现的实用程序方法:
import nu.xom.*; import java.io.*; [...] public static String format(String xml) throws ParsingException, IOException { ByteArrayOutputStream out = new ByteArrayOutputStream(); Serializer serializer = new Serializer(out); serializer.setIndent(4); // or whatever you like serializer.write(new Builder().build(xml, "")); return out.toString("UTF-8"); }
我测试了它是否有效,并且结果不取决于JRE版本或者类似的内容。要查看如何根据自己的喜好自定义输出格式,请查看Serializer
API。
实际上,这比我认为需要一些额外的行的时间更长,因为Serializer
想要OutputStream
进行写入。但是请注意,这里很少有用于实际的XML编码的代码。
(此答案是我对XOM评估的一部分,在我的问题中,这是关于替换dom4j的最佳Java XML库的一个建议。作为记录,使用dom4j,可以使用XMLWriter和OutputFormat轻松实现这一目标。 `。编辑:...如mlo55的答案所示。)
基于此答案的更简单的解决方案:
public static String prettyFormat(String input, int indent) { try { Source xmlInput = new StreamSource(new StringReader(input)); StringWriter stringWriter = new StringWriter(); StreamResult xmlOutput = new StreamResult(stringWriter); TransformerFactory transformerFactory = TransformerFactory.newInstance(); transformerFactory.setAttribute("indent-number", indent); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.transform(xmlInput, xmlOutput); return xmlOutput.getWriter().toString(); } catch (Exception e) { throw new RuntimeException(e); // simple exception handling, please review it } } public static String prettyFormat(String input) { return prettyFormat(input, 2); }
测试用例:
prettyFormat("<root><child>aaa</child><child/></root>");
返回:
<?xml version="1.0" encoding="UTF-8"?> <root> <child>aaa</child> <child/> </root>
嗯...面对这样的事情,这是一个已知的错误...
只需添加此OutputProperty ..
transformer.setOutputProperty(OutputPropertiesFactory.S_KEY_INDENT_AMOUNT, "8");
希望这可以帮助 ...
这是使用dom4j的一种方法:
进口:
import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.io.OutputFormat; import org.dom4j.io.XMLWriter;
代码:
String xml = "<your xml='here'/>"; Document doc = DocumentHelper.parseText(xml); StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter xw = new XMLWriter(sw, format); xw.write(doc); String result = sw.toString();
凯文·哈坎森(Kevin Hakanson)说:
"但是,如果我们知道XML字符串是有效的,并且不想增加将字符串解析为DOM的内存开销,然后对DOM进行转换以取回字符串,则可以执行一些老式的操作逐字符解析。在每个字符后插入换行符和空格,保持并缩进计数器(确定空格数),对于每个<...>递增,对于看到的每个递减。"
同意这种方法要快得多,依赖性要少得多。
解决方案示例:
/** * XML utils, including formatting. */ public class XmlUtils { private static XmlFormatter formatter = new XmlFormatter(2, 80); public static String formatXml(String s) { return formatter.format(s, 0); } public static String formatXml(String s, int initialIndent) { return formatter.format(s, initialIndent); } private static class XmlFormatter { private int indentNumChars; private int lineLength; private boolean singleLine; public XmlFormatter(int indentNumChars, int lineLength) { this.indentNumChars = indentNumChars; this.lineLength = lineLength; } public synchronized String format(String s, int initialIndent) { int indent = initialIndent; StringBuilder sb = new StringBuilder(); for (int i = 0; i < s.length(); i++) { char currentChar = s.charAt(i); if (currentChar == '<') { char nextChar = s.charAt(i + 1); if (nextChar == '/') indent -= indentNumChars; if (!singleLine) // Don't indent before closing element if we're creating opening and closing elements on a single line. sb.append(buildWhitespace(indent)); if (nextChar != '?' && nextChar != '!' && nextChar != '/') indent += indentNumChars; singleLine = false; // Reset flag. } sb.append(currentChar); if (currentChar == '>') { if (s.charAt(i - 1) == '/') { indent -= indentNumChars; sb.append("\n"); } else { int nextStartElementPos = s.indexOf('<', i); if (nextStartElementPos > i + 1) { String textBetweenElements = s.substring(i + 1, nextStartElementPos); // If the space between elements is solely newlines, let them through to preserve additional newlines in source document. if (textBetweenElements.replaceAll("\n", "").length() == 0) { sb.append(textBetweenElements + "\n"); } // Put tags and text on a single line if the text is short. else if (textBetweenElements.length() <= lineLength * 0.5) { sb.append(textBetweenElements); singleLine = true; } // For larger amounts of text, wrap lines to a maximum line length. else { sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n"); } i = nextStartElementPos - 1; } else { sb.append("\n"); } } } } return sb.toString(); } } private static String buildWhitespace(int numChars) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < numChars; i++) sb.append(" "); return sb.toString(); } /** * Wraps the supplied text to the specified line length. * @lineLength the maximum length of each line in the returned string (not including indent if specified). * @indent optional number of whitespace characters to prepend to each line before the text. * @linePrefix optional string to append to the indent (before the text). * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with * indent and prefix applied to each line. */ private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix) { if (s == null) return null; StringBuilder sb = new StringBuilder(); int lineStartPos = 0; int lineEndPos; boolean firstLine = true; while(lineStartPos < s.length()) { if (!firstLine) sb.append("\n"); else firstLine = false; if (lineStartPos + lineLength > s.length()) lineEndPos = s.length() - 1; else { lineEndPos = lineStartPos + lineLength - 1; while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t')) lineEndPos--; } sb.append(buildWhitespace(indent)); if (linePrefix != null) sb.append(linePrefix); sb.append(s.substring(lineStartPos, lineEndPos + 1)); lineStartPos = lineEndPos + 1; } return sb.toString(); } // other utils removed for brevity }
我遇到了同样的问题,并且使用JTidy取得了巨大的成功(http://jtidy.sourceforge.net/index.html)
例子:
Tidy t = new Tidy(); t.setIndentContent(true); Document d = t.parseDOM( new ByteArrayInputStream("HTML goes here", null); OutputStream out = new ByteArrayOutputStream(); t.pprint(d, out); String html = out.toString();
仅需注意,评分最高的答案需要使用xerces。
如果我们不想添加此外部依赖项,则只需使用标准的jdk库(实际上是内部使用xerces构建的)。
N.B. jdk 1.5版存在一个错误,请参见http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已解决。
(请注意,如果发生错误,这将返回原始文本)
package com.test; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import javax.xml.transform.OutputKeys; import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.sax.SAXSource; import javax.xml.transform.sax.SAXTransformerFactory; import javax.xml.transform.stream.StreamResult; import org.xml.sax.InputSource; public class XmlTest { public static void main(String[] args) { XmlTest t = new XmlTest(); System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>")); } public String formatXml(String xml){ try{ Transformer serializer= SAXTransformerFactory.newInstance().newTransformer(); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); //serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); //serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2"); Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes()))); StreamResult res = new StreamResult(new ByteArrayOutputStream()); serializer.transform(xmlSource, res); return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray()); }catch(Exception e){ //TODO log error return xml; } } }