是否有可以在不解析字符实体的情况下解析文档的 Java XML API?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1777878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a Java XML API that can parse a document without resolving character entities?
提问by Kaypro II
I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.
我有需要解析包含字符实体的 XML 的程序。程序本身不需要解析它们,它们的列表很大并且会改变,所以如果可以的话,我想避免对这些实体的显式支持。
Here's a simple example:
这是一个简单的例子:
<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>
Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.
是否有 Java XML API 可以在不解析(非标准)字符实体的情况下成功解析文档?理想情况下,它会将它们转换为可以特殊处理的特殊事件或对象,但我会选择一种可以静默抑制它们的选项。
Answer & Example:
答案与示例:
Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCESset to false.
Skaffman 给了我答案:使用IS_REPLACING_ENTITY_REFERENCES设置为 false的 StAX 解析器。
Here's the code I whipped up to try it out:
这是我尝试使用的代码:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
new FileInputStream("your file here"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
For the above XML, it will print "Entity Reference: something".
对于上面的 XML,它会打印“ Entity Reference: something”。
回答by skaffman
The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCESproperty:
STaX API 通过IS_REPLACING_ENTITY_REFERENCES属性支持不替换字符实体引用的概念:
Requires the parser to replace internal entity references with their replacement text and report them as characters
要求解析器用替换文本替换内部实体引用,并将它们报告为字符
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReaderor XmlStreamReader. However, the API is careful to say that this property is only intended to forcethe implementation to perform the replacement, rather than forcing it to notreplace them. Still, it's got to be worth a try.
这可以设置为 an XmlInputFactory,然后依次用于构造一个XmlEventReaderor XmlStreamReader。但是,API 谨慎地说,此属性仅用于强制实现执行替换,而不是强制它不替换它们。尽管如此,它还是值得一试。
回答by Jim Ferrans
A SAX parse with an org.xml.sax.EntityResolvermight suit your purpose. You could for sure suppress them, and you could probably find a way to leave them unresolved.
带有org.xml.sax.EntityResolver 的SAX 解析可能适合您的目的。你肯定可以压制它们,你可能会想办法让它们悬而未决。
This tutorialseems the most relevant: it shows how to resolve entities into strings.
本教程似乎最相关:它展示了如何将实体解析为字符串。
回答by bill seacham
I am not a Java developer, but I "think" Java xml classes support a similar functionality to .net for accomplishing this. IN .net the xmlreadersettings class you set the ProhibitDtd property false and set the XmlResolver property to null. This will cause the parser to ignore externally referenced entities without throwing an exception when they are read. I just did a google search for "Java ignore enity" and got lots of hits, some of which appear to address this topic. I realize this is not a total answer to your question but it should point you in a useful direction.
我不是 Java 开发人员,但我“认为”Java xml 类支持与 .net 类似的功能来实现这一点。在 .net xmlreadersettings 类中,您将 ProhibitDtd 属性设置为 false 并将 XmlResolver 属性设置为 null。这将导致解析器忽略外部引用的实体,而不会在读取时抛出异常。我刚刚在谷歌上搜索了“Java ignore enity”并获得了很多点击,其中一些似乎解决了这个话题。我意识到这不是您问题的完整答案,但它应该为您指明一个有用的方向。
回答by user2050348
Works for me only when disabling support of external entities:
仅在禁用对外部实体的支持时对我有用:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
inputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

