java 生成 XML 时如何在 CDATA 中保留换行符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1216875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to preserve newlines in CDATA when generating XML?
提问by clamp
I want to write some text that contains whitespace characters such as newlineand tabinto an xml file so I use
我想写一些包含空白字符的文本,例如newline和tab到一个xml文件中,所以我使用
Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));
but when I read this back in using
但是当我在使用中阅读此内容时
Node vs = xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();
I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.
我得到一个没有换行符的字符串。
当我直接查看磁盘上的 xml 时,换行符似乎被保留了下来。所以在读取xml文件时出现问题。
How can I preserve the newlines?
如何保留换行符?
Thanks!
谢谢!
采纳答案by Aviad Ben Dov
I don't know how you parse and write your document, but here's an enhanced code example based on yours:
我不知道您是如何解析和编写文档的,但这里有一个基于您的增强代码示例:
// creating the document in-memory
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element element = xmldoc.createElement("TestElement");
xmldoc.appendChild(element);
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));
// serializing the xml to a string
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
String str = writer.writeToString(xmldoc);
// printing the xml for verification of whitespace in cdata
System.out.println("--- XML ---");
System.out.println(str);
// de-serializing the xml from the string
final Charset charset = Charset.forName("utf-16");
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);
Node vs = xmldoc2.getElementsByTagName("TestElement").item(0);
final Node child = vs.getFirstChild();
String x = child.getNodeValue();
// print the value, yay!
System.out.println("--- Node Text ---");
System.out.println(x);
The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:
使用 LSSerializer 进行序列化是 W3C 的方式(请参阅此处)。输出符合预期,带有行分隔符:
--- XML ---
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text ---
first line
second line
回答by fpmurphy
You need to check the type of each node using node.getNodeType(). If the type is CDATA_SECTION_NODE, you need to concat the CDATA guards to node.getNodeValue.
您需要使用 node.getNodeType() 检查每个节点的类型。如果类型为 CDATA_SECTION_NODE,则需要将 CDATA 守卫连接到 node.getNodeValue。
回答by LiorH
You don't necessarily have to use CDATA to preserve white space characters. The XML specificationspecify how to encode these characters.
您不一定必须使用 CDATA 来保留空白字符。XML规范指定了如何对这些字符进行编码。
So for example, if you have an element with value that contains new space you should encode it with
因此,例如,如果您有一个值包含新空间的元素,您应该使用


Carriage return:
回车:

And so forth
等等
回答by McDowell
EDIT: cut all the irrelevant stuff
编辑:剪掉所有不相关的东西
I'm curious to know what DOM implementation you're using, because it doesn't mirror the default behaviour of the one in a couple of JVMs I've tried (they ship with a Xerces impl). I'm also interested in what newline characters your document has.
我很想知道您使用的是什么 DOM 实现,因为它没有反映我尝试过的几个 JVM 中的默认行为(它们随附有 Xerces impl)。我也对您的文档有哪些换行符感兴趣。
I'm not sure if whether CDATA should preserve whitespace is a given. I suspect that there are many factors involved. Don't DTDs/schemas affect how whitespace is processed?
我不确定 CDATA 是否应该保留空格是给定的。我怀疑这涉及很多因素。DTD/模式不会影响空白的处理方式吗?
You could try using the xml:space="preserve" attribute.
您可以尝试使用 xml:space="preserve" 属性。
回答by Mike Beckerle
xml:space='preserve' is not it. That is only for "all whitespace" nodes. That is, if you want the whitespace nodes in
xml:space='preserve' 不是。这仅适用于“所有空白”节点。也就是说,如果你想要空白节点
<this xml:space='preserve'> <has/>
<whitespace/>
</this>
But see that those whitespace nodes are ONLY whitespace.
但是看到那些空白节点只是空白。
I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. I have no solution as yet.
我一直在努力让 Xerces 生成允许隔离 CDATA 内容的事件。我还没有解决办法。

