java 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/582352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I ignore DTD validation but keep the Doctype when writing an XML file?
提问by
I am working on a system that should be able to read any (or at least, any well-formed) XML file, manipulate a few nodes and write them back into that same file. I want my code to be as generic as possible and I don't want
我正在开发一个系统,该系统应该能够读取任何(或至少是任何格式良好的)XML 文件、操作几个节点并将它们写回到同一个文件中。我希望我的代码尽可能通用,但我不希望
- hardcoded references to Schema/Doctype information anywhere in my code. The doctype information is in the source document, I want to keep exactly that doctype information and not provide it again from within my code. If a document has no DocType, I won't add one. I do not care about the form or content of these files at all, except for my few nodes.
- custom EntityResolvers or StreamFilters to omit or otherwise manipulate the source information (It is already a pity that namespace information seems somehow inaccessible from the document file where it is declared, but I can manage by using uglier XPaths)
- DTD validation. I don't have the referenced DTDs, I don't want to include them and Node manipulation is perfectly possible without knowing about them.
- 在我的代码中的任何地方硬编码对 Schema/Doctype 信息的引用。doctype 信息在源文档中,我想完全保留该 doctype 信息并且不在我的代码中再次提供它。如果文档没有 DocType,我不会添加。除了我的几个节点之外,我根本不关心这些文件的形式或内容。
- 自定义 EntityResolvers 或 StreamFilters 以省略或以其他方式操作源信息(遗憾的是,命名空间信息似乎无法从声明它的文档文件中访问,但我可以使用更丑陋的 XPath 进行管理)
- DTD 验证。我没有引用的 DTD,我不想包含它们,并且在不了解它们的情况下完全可以操作 Node。
The aim is to have the source file entirely unchanged except for the changed Nodes, which are retrieved via XPath. I would like to get away with the standard javax.xml stuff.
目的是让源文件完全不变,除了通过 XPath 检索的已更改节点。我想摆脱标准的 javax.xml 的东西。
My progress so far:
我到目前为止的进展:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setAttribute("http://xml.org/sax/features/namespaces", true);
factory.setAttribute("http://xml.org/sax/features/validation", false);
factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setNamespaceAware(true);
factory.setIgnoringElementContentWhitespace(false);
factory.setIgnoringComments(false);
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(inStream));
This loads the XML source into a org.w3c.dom.Document successfully, ignoring DTD validation. I can do my replacements and then I use
这将 XML 源成功加载到 org.w3c.dom.Document 中,忽略 DTD 验证。我可以做我的替换然后我使用
Source source = new DOMSource(document);
Result result = new StreamResult(getOutputStream(getPath()));
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);
to write it back. Which is nearly perfect. But the Doctype tag is gone, no matter what I do. While debugging, I saw that there is a DeferredDoctypeImpl [log4j:configuration: null] object in the Document object after parsing, but it is somehow wrong, empty or ignored. The file I tested on starts like this (but it is the same for other file types):
把它写回来。这几乎是完美的。但是无论我做什么,Doctype 标签都不见了。调试的时候看到解析后的Document对象中有一个DeferredDoctypeImpl [log4j:configuration:null]对象,但是莫名其妙的报错,为空或者被忽略了。我测试的文件是这样开始的(但对于其他文件类型也是如此):
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">
[...]
[...]
I think there are a lot of (easy?) ways involving hacks or pulling additional JARs into the project. But I would rather like to have it with the tools I already use.
我认为有很多(简单的?)方法涉及 hack 或将额外的 JAR 拉入项目。但我更愿意将它与我已经使用的工具一起使用。
回答by
Sorry, got it right now using a XMLSerializer instead of the Transformer...
抱歉,现在使用 XMLSerializer 而不是 Transformer 得到它...
回答by Mihai Chintoanu
Here's how you could do it using the LSSerializer found in JDK:
以下是使用 JDK 中的 LSSerializer 的方法:
private void writeDocument(Document doc, String filename)
throws IOException {
Writer writer = null;
try {
/*
* Could extract "ls" to an instance attribute, so it can be reused.
*/
DOMImplementationLS ls = (DOMImplementationLS)
DOMImplementationRegistry.newInstance().
getDOMImplementation("LS");
writer = new OutputStreamWriter(new FileOutputStream(filename));
LSOutput lsout = ls.createLSOutput();
lsout.setCharacterStream(writer);
/*
* If "doc" has been constructed by parsing an XML document, we
* should keep its encoding when serializing it; if it has been
* constructed in memory, its encoding has to be decided by the
* client code.
*/
lsout.setEncoding(doc.getXmlEncoding());
LSSerializer serializer = ls.createLSSerializer();
serializer.write(doc, lsout);
} catch (Exception e) {
throw new IOException(e);
} finally {
if (writer != null) writer.close();
}
}
Needed imports:
需要进口:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
I know this is an old question which has already been answered, but I think the technical details might help someone.
我知道这是一个已经回答的老问题,但我认为技术细节可能对某人有所帮助。
回答by Zee
I tried using the LSSerializer library and was unable to get anywhere with it in terms of retaining the Doctype. This is the solution that Stephan probably used Note: This is in scala but uses a java library so just convert your code
我尝试使用 LSSerializer 库,但在保留 Doctype 方面无法使用它。这是 Stephan 可能使用的解决方案 注意:这是在 Scala 中,但使用的是 Java 库,因此只需转换您的代码
import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer}
def transformXML(root: Element, file: String): Unit = {
val doc = root.getOwnerDocument
val format = new OutputFormat(doc)
format.setIndenting(true)
val writer = new OutputStreamWriter(new FileOutputStream(new File(file)))
val serializer = new XMLSerializer(writer, format)
serializer.serialize(doc)
}

