Java 快速、轻量级的 XML 解析器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2134507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast, lightweight XML parser
提问by joe90
I have a specific format XML document that I will get pushed. This document will always be the same type so it's very strict.
我有一个特定格式的 XML 文档,我将得到它。该文件将始终是同一类型,因此非常严格。
I need to parse this so that I can convert it into JSON (well, a slightly bastardized version so someone else can use it with DOJO).
我需要解析它,以便我可以将它转换为 JSON(嗯,一个稍微混蛋的版本,以便其他人可以将它与 DOJO 一起使用)。
My question is, shall I use a very fast lightweight (no need for SAX, etc.) XML parser (any ideas?) or write my own, basically converting into a StringBuffer and spinning through the array? Basically, under the covers I assume all HTML parsers will spin thru the string (or memory buffer) and parse, producing output on the way through.
我的问题是,我应该使用非常快速的轻量级(不需要 SAX 等)XML 解析器(有什么想法?)还是自己编写,基本上转换为 StringBuffer 并在数组中旋转?基本上,在幕后我假设所有 HTML 解析器都将通过字符串(或内存缓冲区)旋转并解析,并在通过的过程中产生输出。
Thanks
谢谢
edit
编辑
The xml will be between 3/4 lines to about 50 max (at the extreme)..
xml 将在 3/4 行到大约 50 行之间(极端情况下)。
采纳答案by Chad Okere
No, you should not try to write your own XML parser for this.
不,您不应该尝试为此编写自己的 XML 解析器。
SAX itself is very lightweight and fast, so I'm not sure why think it's too much. Also using a string buffer would actually be much less scalablethen using SAX because SAX doesn't require you to load the whole XML file into memory to use it. I've used SAX to parse through multigigabyte XML files, which you wouldn't be able to do using string buffers on a 32 bit machine.
SAX 本身非常轻量级和快速,所以我不知道为什么认为它太多了。此外,使用字符串缓冲区实际上比使用 SAX 的可扩展性要差得多,因为 SAX 不需要您将整个 XML 文件加载到内存中来使用它。我已经使用 SAX 来解析多千兆字节的 XML 文件,在 32 位机器上使用字符串缓冲区是无法做到的。
If you have small files and you don't need to worry about performance, look into using the DOM. Java's implementation can be kind of annoying to use (You create a document by using a DocumentBuilder, which comes from a DocumentBuilderFactory)
如果您有小文件并且不需要担心性能,请考虑使用 DOM。Java 的实现使用起来可能有点烦人(您使用来自 DocumentBuilderFactory 的 DocumentBuilder 创建文档)
The code to create a document from a file looks like this:
从文件创建文档的代码如下所示:
Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("file.xml"));
(note that keeping a reference to your document builder will speed things up if you need to parse multiple files)
(请注意,如果您需要解析多个文件,保留对文档构建器的引用会加快速度)
Then you use the function in org.w3c.dom.Documentto read or manipulate the contents. For example getElementsByTagName()returns all the Elements with a certain tag name.
然后使用org.w3c.dom.Document 中的函数来读取或操作内容。例如getElementsByTagName()返回具有特定标签名称的所有元素。
回答by Teja Kantamneni
you can use Dom4j/xstream to read the xml into an equivalent java modal and then use JSONLIB to convert into JSON.
您可以使用 Dom4j/xstream 将 xml 读入等效的 java 模式,然后使用 JSONLIB 转换为 JSON。
回答by Quentin
Use a real XML parser. If you don't, you will probably get bitten when something changes. The document may be "very strict", but in two years time, something will probably get re-factored and it will change structure so that it parses to the same data structure with an XML parser and breaks a homebrew string parser.
使用真正的 XML 解析器。如果你不这样做,当事情发生变化时,你可能会被咬伤。该文档可能“非常严格”,但在两年后,某些内容可能会被重构并更改结构,以便使用 XML 解析器解析为相同的数据结构并破坏自制字符串解析器。
回答by Jon
It really depends on the type of XML you're parsing. I wouldn't write your own parser when there's something already there to do the job for you.
这实际上取决于您解析的 XML 类型。当已经有一些东西可以为您完成工作时,我不会编写您自己的解析器。
The choice of SAX/DOM is really based on what you're trying to parse, see this for how to decide on which one to use:
SAX/DOM 的选择实际上取决于您要解析的内容,请参阅此内容以了解如何决定使用哪个:
http://geekexplains.blogspot.com/2009/04/sax-vs-dom-differences-between-dom-and.html
http://geekexplains.blogspot.com/2009/04/sax-vs-dom-differences-between-dom-and.html
Even if you don't use SAX/DOM there are still simple options available to you, take a look at Simple : )
即使您不使用 SAX/DOM,您仍然可以使用一些简单的选项,看看 Simple :)
http://simple.sourceforge.net/
http://simple.sourceforge.net/
You may also want to consider STaX.
您可能还想考虑 STaX。
回答by WildWezyr
Maybe you should look at kXML 2, a small XML pull parser specially designed for constrained environments, to access, parse, and display XML files for Java 2 Micro Edition-enabled devices. It works well with Java SE/EE too ;-). As it is designed for micro edition, it is really light-weight (small footprint) and IMHO really easy to use (much more easier than SAX/DOM etc. stuff).
也许您应该看看 kXML 2,这是一个专门为受限环境设计的小型 XML 拉式解析器,用于访问、解析和显示支持 Java 2 Micro Edition 的设备的 XML 文件。它也适用于 Java SE/EE ;-)。因为它是为微型版本设计的,所以它真的很轻(占用空间小)而且恕我直言真的很容易使用(比 SAX/DOM 等东西容易得多)。
From my own experience with kXML 2: I used it to parse XML files larger than 1 GB - Wikipedia dumps and I was very happy with performance / memory consumption etc.
根据我自己使用 kXML 2 的经验:我用它来解析大于 1 GB 的 XML 文件 - 维基百科转储,我对性能/内存消耗等感到非常满意。
At last ;-) - link: http://kxml.sourceforge.net/kxml2/
最后;-) - 链接:http: //kxml.sourceforge.net/kxml2/
回答by peller
parsing on the backend and exposing JSON is probably the right way to go so that you would have general purpose JSON data that you can easily integrate with other sources, but if you have a simple message and this is the only place you think you'd be using JSON, you could try to do the parsing client side. Dojo has an experimental client-side XML parser
在后端解析并公开 JSON 可能是正确的方法,这样您就可以获得通用 JSON 数据,您可以轻松地与其他来源集成,但是如果您有一个简单的消息,并且这是您认为唯一的地方正在使用 JSON,您可以尝试进行解析客户端。Dojo 有一个实验性的客户端 XML 解析器
回答by Brian
Do you have to use XML?
你必须使用 XML 吗?
I found that my own custom text format was much faster than either XML or JSON with any of the off the shelf packages - they were fast, but by controlling my own format and just doing String parsing I was able to cut the time in half against the fastest XML implementation.
我发现我自己的自定义文本格式比使用任何现成包的 XML 或 JSON 快得多 - 它们很快,但是通过控制我自己的格式并只进行字符串解析,我能够将时间减少一半最快的 XML 实现。
Obviously this only works if you're fully in charge of formats and may not be appropriate to your situation, but for any others in this situation: don't think XML is the absolute fastest option you have. It's not.
显然,这仅在您完全掌控格式并且可能不适合您的情况时才有效,但对于这种情况下的任何其他人:不要认为 XML 是您拥有的绝对最快的选择。它不是。
回答by Bal
Do you really need to parse/manipulate any of the data in the XML document? If not, you could just create use an XSLT. Really simple, really fast.
您真的需要解析/操作 XML 文档中的任何数据吗?如果没有,您可以创建使用 XSLT。真的很简单,真的很快。