适用于 Java 的最佳 XML 解析器

Question

提问by Evan

I need to read smallish (few MB at the most, UTF-8 encoded) XML files, rummage around looking at various elements and attributes, perhaps modify a few and write the XML back out again to disk (preferably with nice, indented formatting).

我需要读取较小的（最多几 MB，UTF-8 编码）XML 文件，翻找各种元素和属性，也许修改一些并将 XML 重新写回磁盘（最好使用漂亮的缩进格式） .

What would be the best XML parser for my needs? There are lots to choose from. Some I'm aware of are:

什么是满足我需求的最佳 XML 解析器？有很多可供选择。我知道的一些是：

And of course the one in the JDK (I'm using Java 6). I'm familiar with Xerces but find it clunky.

当然还有 JDK 中的那个（我使用的是 Java 6）。我熟悉 Xerces，但觉得它笨重。

Recommendations?

采纳答案by zehrer

If speed and memory is no problem, dom4jis a really good option. If you need speed, using a StAX parser like Woodstoxis the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

如果速度和内存没有问题，dom4j是一个非常好的选择。如果您需要速度，使用像Woodstox这样的 StAX 解析器是正确的方法，但是您必须编写更多代码来完成任务，并且您必须习惯于在流中处理 XML。

Answer 2

回答by Brian Matthews

I have found dom4j to be the tool for working with XML. Especially compared to Xerces.

我发现 dom4j 是处理 XML 的工具。尤其是与 Xerces 相比。

Answer 3

回答by Fernando Miguélez

I think you should not consider any specific parser implementation. Java API for XML Processinglets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

我认为您不应该考虑任何特定的解析器实现。用于 XML 处理的 Java API允许您以标准方式使用任何符合标准的解析器实现。代码应该更具可移植性，并且当您意识到特定解析器变得太旧时，您可以将其替换为另一个而不更改您的代码行（如果您做得正确）。

Basically there are three ways of handling XML in a standard way:

基本上有三种以标准方式处理 XML 的方法：

SAXThis is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
DOMThis method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
StAXThis is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

SAX这是最简单的 API。您可以通过定义一个 Handler 类来读取 XML，该类在以串行方式处理 XML 时接收元素/属性中的数据。如果您只打算读取一些属性/元素和/或写回一些值（您的情况），它会更快更简单。
DOM此方法创建一个对象树，允许您随机修改/访问它，因此更适合复杂的 XML 操作和处理。
StAX这是在 SAX 和 DOM 之间路径的中间。您只需编写代码即可在处理数据时从您感兴趣的解析器中提取数据。

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

忘掉诸如 JDOM 或 Apache 之类的专有 API（即Apache Xerces XMLSerializer），因为它会将您绑定到可以随时间演变或失去向后兼容性的特定实现，这将使您在将来想要升级到新版本的 JDOM 或您使用的任何解析器。如果您坚持使用 Java 标准 API（使用工厂和接口），您的代码将更加模块化和可维护。

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.

没有必要说所有（我没有检查过所有，但我几乎可以肯定）所提议的解析器都符合 JAXP 实现，因此从技术上讲，无论哪种，您都可以使用所有的。

Answer 4

回答by zehrer

In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.

除了 SAX 和 DOM 之外，还有使用 XMLStreamReader 的 STaX 解析，它是一个 xml pull 解析器。

Answer 5

回答by Uri

If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

如果您不太关心性能，我是 Apache Digester 的忠实粉丝，因为它本质上允许您直接从 XML 映射到 Java Bean。

Otherwise, you have to first parse, and then construct your objects.

否则，您必须首先解析，然后构造您的对象。

Answer 6

回答by Uri

I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.

我不建议这样做，因为您的应用程序中有很多“思考”，但使用 XSLT 可能比 Java 操作更好（并且可能更快地使用 XSLT 到字节码编译）。

Answer 7

回答by Kadir

Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html)

这是对 DOM、SAX、StAX 和 TrAX 的一个很好的比较（来源：http: //download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html）

Feature StAX SAX DOM TrAX

特性 StAX SAX DOM TrAX

API Type Pull,streaming Push,streaming In memory tree XSLT Rule

API 类型 Pull、流式推送、流式内存树 XSLT 规则

Ease of Use High Medium High Medium

易用性 高中高中

XPath Capability No No Yes Yes

XPath 功能 否否是是

CPU & Memory Good Good Varies Varies

CPU 和内存 好好不同

Forward Only Yes Yes No No

仅转发 是是否否

Read XML Yes Yes Yes Yes

读取 XML 是是是是

Write XML Yes No Yes Yes

写入 XML 是否是是

CRUD No No Yes No

CRUD 否否是否

Answer 8

回答by asdf

Simple XML http://simple.sourceforge.net/is very easy for (de)serializing objects.

Simple XML http://simple.sourceforge.net/对于（反）序列化对象非常容易。

适用于 Java 的最佳 XML 解析器

提问by Evan

采纳答案by zehrer

回答by Brian Matthews

回答by Fernando Miguélez

回答by zehrer

回答by Uri

回答by Uri

回答by Kadir

回答by asdf

相关推荐

最近更新

标签

适用于 Java 的最佳 XML 解析器

提问by Evan

采纳答案by zehrer

回答by Brian Matthews

回答by Fernando Miguélez

回答by zehrer

回答by Uri

回答by Uri

回答by Kadir

回答by asdf

相关推荐

如何解析 Java 中的命令行参数？

Java System.currentTimeMillis() vs. new Date() vs. Calendar.getInstance().getTime()

如何在Java中获得真实的字符串高度？

将字符串从 ASCII 转换为 Java 中的 EBCDIC？

相关推荐

最近更新

标签