java 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

Question

提问by

I am working on a system that should be able to read any (or at least, any well-formed) XML file, manipulate a few nodes and write them back into that same file. I want my code to be as generic as possible and I don't want

我正在开发一个系统，该系统应该能够读取任何（或至少是任何格式良好的）XML 文件、操作几个节点并将它们写回到同一个文件中。我希望我的代码尽可能通用，但我不希望

hardcoded references to Schema/Doctype information anywhere in my code. The doctype information is in the source document, I want to keep exactly that doctype information and not provide it again from within my code. If a document has no DocType, I won't add one. I do not care about the form or content of these files at all, except for my few nodes.
custom EntityResolvers or StreamFilters to omit or otherwise manipulate the source information (It is already a pity that namespace information seems somehow inaccessible from the document file where it is declared, but I can manage by using uglier XPaths)
DTD validation. I don't have the referenced DTDs, I don't want to include them and Node manipulation is perfectly possible without knowing about them.

在我的代码中的任何地方硬编码对 Schema/Doctype 信息的引用。doctype 信息在源文档中，我想完全保留该 doctype 信息并且不在我的代码中再次提供它。如果文档没有 DocType，我不会添加。除了我的几个节点之外，我根本不关心这些文件的形式或内容。
自定义 EntityResolvers 或 StreamFilters 以省略或以其他方式操作源信息（遗憾的是，命名空间信息似乎无法从声明它的文档文件中访问，但我可以使用更丑陋的 XPath 进行管理）
DTD 验证。我没有引用的 DTD，我不想包含它们，并且在不了解它们的情况下完全可以操作 Node。

The aim is to have the source file entirely unchanged except for the changed Nodes, which are retrieved via XPath. I would like to get away with the standard javax.xml stuff.

目的是让源文件完全不变，除了通过 XPath 检索的已更改节点。我想摆脱标准的 javax.xml 的东西。

My progress so far:

我到目前为止的进展：

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setAttribute("http://xml.org/sax/features/namespaces", true);
    factory.setAttribute("http://xml.org/sax/features/validation", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

    factory.setNamespaceAware(true);
    factory.setIgnoringElementContentWhitespace(false);
    factory.setIgnoringComments(false);
    factory.setValidating(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(new InputSource(inStream));

This loads the XML source into a org.w3c.dom.Document successfully, ignoring DTD validation. I can do my replacements and then I use

这将 XML 源成功加载到 org.w3c.dom.Document 中，忽略 DTD 验证。我可以做我的替换然后我使用

    Source source = new DOMSource(document);
    Result result = new StreamResult(getOutputStream(getPath()));

    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    xformer.transform(source, result);

to write it back. Which is nearly perfect. But the Doctype tag is gone, no matter what I do. While debugging, I saw that there is a DeferredDoctypeImpl [log4j:configuration: null] object in the Document object after parsing, but it is somehow wrong, empty or ignored. The file I tested on starts like this (but it is the same for other file types):

把它写回来。这几乎是完美的。但是无论我做什么，Doctype 标签都不见了。调试的时候看到解析后的Document对象中有一个DeferredDoctypeImpl [log4j:configuration:null]对象，但是莫名其妙的报错，为空或者被忽略了。我测试的文件是这样开始的（但对于其他文件类型也是如此）：

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">

[...]

I think there are a lot of (easy?) ways involving hacks or pulling additional JARs into the project. But I would rather like to have it with the tools I already use.

我认为有很多（简单的？）方法涉及 hack 或将额外的 JAR 拉入项目。但我更愿意将它与我已经使用的工具一起使用。

Answer 1

回答by

Sorry, got it right now using a XMLSerializer instead of the Transformer...

抱歉，现在使用 XMLSerializer 而不是 Transformer 得到它...

Answer 2

回答by Mihai Chintoanu

Here's how you could do it using the LSSerializer found in JDK:

以下是使用 JDK 中的 LSSerializer 的方法：

    private void writeDocument(Document doc, String filename)
            throws IOException {
        Writer writer = null;
        try {
            /*
             * Could extract "ls" to an instance attribute, so it can be reused.
             */
            DOMImplementationLS ls = (DOMImplementationLS) 
                    DOMImplementationRegistry.newInstance().
                            getDOMImplementation("LS");
            writer = new OutputStreamWriter(new FileOutputStream(filename));
            LSOutput lsout = ls.createLSOutput();
            lsout.setCharacterStream(writer);
            /*
             * If "doc" has been constructed by parsing an XML document, we
             * should keep its encoding when serializing it; if it has been
             * constructed in memory, its encoding has to be decided by the
             * client code.
             */
            lsout.setEncoding(doc.getXmlEncoding());
            LSSerializer serializer = ls.createLSSerializer();
            serializer.write(doc, lsout);
        } catch (Exception e) {
            throw new IOException(e);
        } finally {
            if (writer != null) writer.close();
        }
    }

Needed imports:

需要进口：

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;

I know this is an old question which has already been answered, but I think the technical details might help someone.

我知道这是一个已经回答的老问题，但我认为技术细节可能对某人有所帮助。

Answer 3

回答by Zee

I tried using the LSSerializer library and was unable to get anywhere with it in terms of retaining the Doctype. This is the solution that Stephan probably used Note: This is in scala but uses a java library so just convert your code

我尝试使用 LSSerializer 库，但在保留 Doctype 方面无法使用它。这是 Stephan 可能使用的解决方案 注意：这是在 Scala 中，但使用的是 Java 库，因此只需转换您的代码

import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer}
 def transformXML(root: Element, file: String): Unit = {
    val doc = root.getOwnerDocument
    val format = new OutputFormat(doc)
    format.setIndenting(true)
    val writer = new OutputStreamWriter(new FileOutputStream(new File(file)))
    val serializer = new XMLSerializer(writer, format)
    serializer.serialize(doc)

  }

java 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

提问by

回答by

回答by Mihai Chintoanu

回答by Zee

相关推荐

最近更新

标签

java 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

提问by

回答by

回答by Mihai Chintoanu

回答by Zee

相关推荐

java 您如何组织游戏代码以适应 MVC 模式？

什么是 Java 中的本机实现？

如何等待 Java 声音剪辑完成播放？

java 如何获取 Eclipse RCP 应用程序的 OSGi BundleContext？

相关推荐

最近更新

标签