如何从Java漂亮地打印XML?
我有一个包含XML的Java字符串,没有换行或者缩进。我想将其转换为格式正确的XML字符串。我该怎么做呢?
String unformattedXml = "<tag><nested>hello</nested></tag>"; String formattedXml = new [UnknownClass]().format(unformattedXml);
注意:我的输入是一个字符串。我的输出是一个字符串。
解决方案
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
//initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
注意:结果可能因Java版本而异。搜索特定于平台的解决方法。
由于我们是以String开头的,因此我们需要先隐蔽到DOM对象(例如Node),然后才能使用Transformer。但是,如果我们知道XML字符串有效,并且不想增加将字符串解析为DOM的内存开销,然后对DOM进行转换以获取字符串,则可以做一些老式的字符通过字符解析。在每个</ ...>字符后插入一个换行符和空格,保持并缩进计数器(确定空格数),我们对每个<...>递增,对每个</。递减。 。>`你知道的。
免责声明我对以下函数进行了剪切/粘贴/文本编辑,因此它们可能无法按原样编译。
public static final Element createDOM(String strXML)
throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource sourceXML = new InputSource(new StringReader(strXML))
Document xmlDoc = db.parse(sourceXML);
Element e = xmlDoc.getDocumentElement();
e.normalize();
return e;
}
public static final void prettyPrint(Node xml, OutputStream out)
throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(new DOMSource(xml), new StreamResult(out));
}
这是我自己的问题的答案。我结合了各种结果的答案,编写了一个漂亮地打印XML的类。
无法保证它如何响应无效的XML或者大型文档。
package ecb.sdw.pretty;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
/**
* Pretty-prints xml, supplied as a string.
* <p/>
* eg.
* <code>
* String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
* </code>
*/
public class XmlFormatter {
public XmlFormatter() {
}
public String format(String unformattedXml) {
try {
final Document document = parseXmlFile(unformattedXml);
OutputFormat format = new OutputFormat(document);
format.setLineWidth(65);
format.setIndenting(true);
format.setIndent(2);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
return out.toString();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private Document parseXmlFile(String in) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(in));
return db.parse(is);
} catch (ParserConfigurationException e) {
throw new RuntimeException(e);
} catch (SAXException e) {
throw new RuntimeException(e);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public static void main(String[] args) {
String unformattedXml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
" xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
" xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
" <Query>\n" +
" <query:CategorySchemeWhere>\n" +
" \t\t\t\t\t <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
" </query:CategorySchemeWhere>\n" +
" </Query>\n\n\n\n\n" +
"</QueryMessage>";
System.out.println(new XmlFormatter().format(unformattedXml));
}
}
有一个非常不错的命令行xml实用程序,称为xmlstarlet(http://xmlstar.sourceforge.net/),它可以完成很多人使用的事情。
我们可以使用Runtime.exec以编程方式执行该程序,然后读入格式化的输出文件。它比其他几行Java代码可以提供更多的选择和更好的错误报告。
下载xmlstarlet:http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589
我过去使用org.dom4j.io.OutputFormat.createPrettyPrint()方法进行了漂亮的打印
public String prettyPrint(final String xml){
if (StringUtils.isBlank(xml)) {
throw new RuntimeException("xml was null or blank in prettyPrint()");
}
final StringWriter sw;
try {
final OutputFormat format = OutputFormat.createPrettyPrint();
final org.dom4j.Document document = DocumentHelper.parseText(xml);
sw = new StringWriter();
final XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
}
catch (Exception e) {
throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
}
return sw.toString();
}
关于"我们必须首先构建DOM树"的评论:不,我们不需要,也不应这样做。
相反,创建一个StreamSource(新的StreamSource(new StringReader(str)),并将其提供给所提到的身份转换器。这将使用SAX解析器,结果将更快。
在这种情况下,构建中间树纯属开销。
否则,排名最高的答案是好的。
如果可以使用第三方XML库,那么可以比现在投票最高的答案所建议的简单得多。
有人说输入和输出都应该是字符串,因此这是一个使用XOM库实现的实用程序方法:
import nu.xom.*;
import java.io.*;
[...]
public static String format(String xml) throws ParsingException, IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
Serializer serializer = new Serializer(out);
serializer.setIndent(4); // or whatever you like
serializer.write(new Builder().build(xml, ""));
return out.toString("UTF-8");
}
我测试了它是否有效,并且结果不取决于JRE版本或者类似的内容。要查看如何根据自己的喜好自定义输出格式,请查看SerializerAPI。
实际上,这比我认为需要一些额外的行的时间更长,因为Serializer想要OutputStream进行写入。但是请注意,这里很少有用于实际的XML编码的代码。
(此答案是我对XOM评估的一部分,在我的问题中,这是关于替换dom4j的最佳Java XML库的一个建议。作为记录,使用dom4j,可以使用XMLWriter和OutputFormat轻松实现这一目标。 `。编辑:...如mlo55的答案所示。)
基于此答案的更简单的解决方案:
public static String prettyFormat(String input, int indent) {
try {
Source xmlInput = new StreamSource(new StringReader(input));
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(xmlInput, xmlOutput);
return xmlOutput.getWriter().toString();
} catch (Exception e) {
throw new RuntimeException(e); // simple exception handling, please review it
}
}
public static String prettyFormat(String input) {
return prettyFormat(input, 2);
}
测试用例:
prettyFormat("<root><child>aaa</child><child/></root>");
返回:
<?xml version="1.0" encoding="UTF-8"?> <root> <child>aaa</child> <child/> </root>
嗯...面对这样的事情,这是一个已知的错误...
只需添加此OutputProperty ..
transformer.setOutputProperty(OutputPropertiesFactory.S_KEY_INDENT_AMOUNT, "8");
希望这可以帮助 ...
这是使用dom4j的一种方法:
进口:
import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.io.OutputFormat; import org.dom4j.io.XMLWriter;
代码:
String xml = "<your xml='here'/>"; Document doc = DocumentHelper.parseText(xml); StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter xw = new XMLWriter(sw, format); xw.write(doc); String result = sw.toString();
凯文·哈坎森(Kevin Hakanson)说:
"但是,如果我们知道XML字符串是有效的,并且不想增加将字符串解析为DOM的内存开销,然后对DOM进行转换以取回字符串,则可以执行一些老式的操作逐字符解析。在每个字符后插入换行符和空格,保持并缩进计数器(确定空格数),对于每个<...>递增,对于看到的每个递减。"
同意这种方法要快得多,依赖性要少得多。
解决方案示例:
/**
* XML utils, including formatting.
*/
public class XmlUtils
{
private static XmlFormatter formatter = new XmlFormatter(2, 80);
public static String formatXml(String s)
{
return formatter.format(s, 0);
}
public static String formatXml(String s, int initialIndent)
{
return formatter.format(s, initialIndent);
}
private static class XmlFormatter
{
private int indentNumChars;
private int lineLength;
private boolean singleLine;
public XmlFormatter(int indentNumChars, int lineLength)
{
this.indentNumChars = indentNumChars;
this.lineLength = lineLength;
}
public synchronized String format(String s, int initialIndent)
{
int indent = initialIndent;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++)
{
char currentChar = s.charAt(i);
if (currentChar == '<')
{
char nextChar = s.charAt(i + 1);
if (nextChar == '/')
indent -= indentNumChars;
if (!singleLine) // Don't indent before closing element if we're creating opening and closing elements on a single line.
sb.append(buildWhitespace(indent));
if (nextChar != '?' && nextChar != '!' && nextChar != '/')
indent += indentNumChars;
singleLine = false; // Reset flag.
}
sb.append(currentChar);
if (currentChar == '>')
{
if (s.charAt(i - 1) == '/')
{
indent -= indentNumChars;
sb.append("\n");
}
else
{
int nextStartElementPos = s.indexOf('<', i);
if (nextStartElementPos > i + 1)
{
String textBetweenElements = s.substring(i + 1, nextStartElementPos);
// If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
if (textBetweenElements.replaceAll("\n", "").length() == 0)
{
sb.append(textBetweenElements + "\n");
}
// Put tags and text on a single line if the text is short.
else if (textBetweenElements.length() <= lineLength * 0.5)
{
sb.append(textBetweenElements);
singleLine = true;
}
// For larger amounts of text, wrap lines to a maximum line length.
else
{
sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
}
i = nextStartElementPos - 1;
}
else
{
sb.append("\n");
}
}
}
}
return sb.toString();
}
}
private static String buildWhitespace(int numChars)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < numChars; i++)
sb.append(" ");
return sb.toString();
}
/**
* Wraps the supplied text to the specified line length.
* @lineLength the maximum length of each line in the returned string (not including indent if specified).
* @indent optional number of whitespace characters to prepend to each line before the text.
* @linePrefix optional string to append to the indent (before the text).
* @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
* indent and prefix applied to each line.
*/
private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
{
if (s == null)
return null;
StringBuilder sb = new StringBuilder();
int lineStartPos = 0;
int lineEndPos;
boolean firstLine = true;
while(lineStartPos < s.length())
{
if (!firstLine)
sb.append("\n");
else
firstLine = false;
if (lineStartPos + lineLength > s.length())
lineEndPos = s.length() - 1;
else
{
lineEndPos = lineStartPos + lineLength - 1;
while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
lineEndPos--;
}
sb.append(buildWhitespace(indent));
if (linePrefix != null)
sb.append(linePrefix);
sb.append(s.substring(lineStartPos, lineEndPos + 1));
lineStartPos = lineEndPos + 1;
}
return sb.toString();
}
// other utils removed for brevity
}
我遇到了同样的问题,并且使用JTidy取得了巨大的成功(http://jtidy.sourceforge.net/index.html)
例子:
Tidy t = new Tidy();
t.setIndentContent(true);
Document d = t.parseDOM(
new ByteArrayInputStream("HTML goes here", null);
OutputStream out = new ByteArrayOutputStream();
t.pprint(d, out);
String html = out.toString();
仅需注意,评分最高的答案需要使用xerces。
如果我们不想添加此外部依赖项,则只需使用标准的jdk库(实际上是内部使用xerces构建的)。
N.B. jdk 1.5版存在一个错误,请参见http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已解决。
(请注意,如果发生错误,这将返回原始文本)
package com.test;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.InputSource;
public class XmlTest {
public static void main(String[] args) {
XmlTest t = new XmlTest();
System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
}
public String formatXml(String xml){
try{
Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
//serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
//serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
StreamResult res = new StreamResult(new ByteArrayOutputStream());
serializer.transform(xmlSource, res);
return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
}catch(Exception e){
//TODO log error
return xml;
}
}
}

