如何在 Java 中读取格式良好的 XML,但跳过模式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1185519/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 15:32:09  来源:igfitidea点击:

How to read well formed XML in Java, but skip the schema?

javaxml

提问by Will Hartung

I want to read an XML file that has a schema declaration in it.

我想读取一个包含架构声明的 XML 文件。

And that's all I want to do, read it. I don't care if it's valid, but I want it to be well formed.

这就是我想要做的,阅读它。我不在乎它是否有效,但我希望它结构良好。

The problem is that the reader is trying to read the schema file, and failing.

问题是阅读器正在尝试读取架构文件,但失败了。

I don't want it to even try.

我什至不想让它尝试。

I've tried disabling validation, but it still insists on trying to read the schema file.

我试过禁用验证,但它仍然坚持尝试读取架构文件。

Ideally, I'd like to do this with a stock Java 5 JDK.

理想情况下,我想使用股票 Java 5 JDK 来做到这一点。

Here's what I have so far, very simple:

这是我到目前为止所拥有的,非常简单:

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(file);

and here's the exception I am getting back:

这是我回来的例外:

java.lang.RuntimeException: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Yes, this HAPPENS to be an XHTML schema, but this isn't an "XHTML" issue, it's an XML issue. Just pointing that out so folks don't get distracted. And, in this case, the W3C is basically saying "don't ask for this thing, it's a silly idea", and I agree. But, again, it's a detail of the issue, not the root of it. I don't want to ask for it AT ALL.

是的,这恰好是 XHTML 模式,但这不是“XHTML”问题,而是 XML 问题。只是指出这一点,以免人们分心。而且,在这种情况下,W3C 基本上是在说“不要要求这个东西,这是一个愚蠢的想法”,我同意。但是,同样,这是问题的细节,而不是问题的根源。我根本不想要求它。

回答by Mads Hansen

The reference is not for Schema, but for a DTD.

该参考不是针对Schema 的,而是针对DTD 的

DTD files can contain more than just structural rules. They can also contain entity references. XML parsers are obliged to load and parse DTD references, because they could contain entity references that might affect how the document is parsed and the content of the file(you could have an entity reference for characters or even whole phrases of text).

DTD 文件可以包含的不仅仅是结构规则。它们还可以包含实体引用。XML 解析器必须加载和解析 DTD 引用,因为它们可能包含可能影响文档解析方式和文件内容的实体引用(您可以拥有字符甚至整个文本短语的实体引用)。

If you want to want to avoid loading and parsing the referenced DTD, you can provide your own EntityResolverand test for the referenced DTD and decide whether load a local copy of the DTD file or just return null.

如果您想避免加载和解析引用的 DTD,您可以提供自己的 EntityResolver并测试引用的 DTD,并决定是加载 DTD 文件的本地副本还是只返回 null。

Code sample from the referenced answer on custom EntityResolvers:

来自自定义 EntityResolvers 参考答案的代码示例:

   builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains("foo.dtd")) {
                return new InputSource(new StringReader(""));
            } else {
                return null;
            }
        }
    });

回答by lambshaanxy

The simplest answer is this one-liner, called after creating the DocumentBuilderFactory:

最简单的答案是这个单行,在创建 DocumentBuilderFactory 后调用:

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Shamelessly cribbed from Make DocumentBuilder.parse ignore DTD references.

无耻地抄袭自Make DocumentBuilder.parse 忽略 DTD 引用

回答by skaffman

The issue here isn't one of validation. Regardless of validation settings, the parser will still attempt to resolve any references in your document, such as entities, DTDs and (sometimes) schemas. It's only later on that it decides to validate using them (or not). You need to plug in an entity resolver to "intercept" these attempts at de-referencing.

这里的问题不是验证之一。无论验证设置如何,解析器仍将尝试解析文档中的任何引用,例如实体、DTD 和(有时)模式。它只是在稍后决定使用它们(或不使用)进行验证。您需要插入一个实体解析器来“拦截”这些取消引用的尝试。

Check out Apache XML Resolverfor an easy(ish) way to do this.

查看Apache XML Resolver以获取一种简单的(ish)方法来执行此操作。

回答by Rich Seller

I've not tested this, but you could try calling setSchema on the factory passing null.

我没有测试过这个,但你可以尝试在传递 null 的工厂上调用 setSchema。

i.e.

IE

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setSchema(null);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);

Update: Looking at DocumentBuilderImpl it looks like this might work, from the constructor it will check the grammar from the factory before checking the schema.

更新:查看 DocumentBuilderImpl 看起来这可能有效,从构造函数它会在检查架构之前检查工厂的语法。

From DocumentBuilderFactoryImpl:

来自 DocumentBuilderFactoryImpl:

public void setSchema(Schema grammar) {
    this.grammar = grammar;
}

From DocumentBuilderImpl constructor:

从 DocumentBuilderImpl 构造函数:

...
this.grammar = dbf.getSchema();
if (grammar != null) {
    XMLParserConfiguration config = domParser.getXMLParserConfiguration();
    XMLComponent validatorComponent = null;
    /** For Xerces grammars, use built-in schema validator. **/
    ...
}

回答by user3332322

This works well to check whether the xml is well formed irrespective of whether it contains a DTD declaration or not.

无论是否包含 DTD 声明,这都可以很好地检查 xml 是否格式正确。