java 如何通过 StAX 修改巨大的 XML 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16479523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to modify a huge XML file by StAX?
提问by Eugene
I have a huge XML (~2GB) and I need to add new Elements and modify the old ones. For example, I have:
我有一个巨大的 XML (~2GB),我需要添加新元素并修改旧元素。例如,我有:
<books>
<book>....</book>
...
<book>....</book>
</books>
And want to get:
并想得到:
<books>
<book>
<index></index>
....
</book>
...
<book>
<index></index>
....
</book>
</books>
I used the following code:
我使用了以下代码:
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(file));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter(file, true));
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.getEventType() == XMLEvent.START_ELEMENT) {
if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
writer.writeStartElement("index");
writer.writeEndElement();
}
}
}
writer.close();
But the result was the following:
但结果如下:
<books>
<book>....</book>
....
<book>....</book>
</books><index></index>
Any ideas?
有任何想法吗?
回答by Evgeniy Dorofeev
Try this
试试这个
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream("1.xml"));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(file));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
writer.add(event);
if (event.getEventType() == XMLEvent.START_ELEMENT) {
if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
writer.add(eventFactory.createStartElement("", null, "index"));
writer.add(eventFactory.createEndElement("", null, "index"));
}
}
}
writer.close();
Notes
笔记
new FileWriter(file, true) is appending to the end of the file, you hardly really need it
new FileWriter(file, true) 附加到文件的末尾,你几乎不需要它
equalsIgnoreCase("book") is bad idea because XML is case-sensitive
equalsIgnoreCase("book") 是个坏主意,因为 XML 区分大小写
回答by Stephen C
Well it is pretty clear why it behaves the way it does. What you are actually doing is opening the existing file in output append mode and writing elements at the end. That clearly contradicts what you are trying to do.
那么很清楚为什么它的行为方式如此。您实际上正在做的是在输出追加模式下打开现有文件并在最后写入元素。这显然与您正在尝试做的事情相矛盾。
(Aside: I'm surprised that it works as well as it does given that the input side is likely to see the elements that the output side is added to the end of the file. And indeed the exceptions like Evgeniy Dorofeev's example gives are the sort of thing I'd expect. The problem is that if you attempt to read and write a text file at the same time, and either the reader or writer uses any form of buffering, explicit or implicit, the reader is liable to see partial states.)
(旁白:我很惊讶它的工作原理和它一样好,因为输入端可能会看到输出端添加到文件末尾的元素。确实像 Evgeniy Dorofeev 的例子给出的例外是我期望的那种事情。问题是,如果您尝试同时读取和写入文本文件,并且读者或作者使用任何形式的缓冲,显式或隐式,读者很可能会看到部分状态。)
To fix this you have to start by reading from one file and writing to a different file. Appending won't work. Then you have to arrange that the elements, attributes, content etc that are read from the input file are copiedto the output file. Finally, you need to add the extra elements at the appropriate points.
要解决此问题,您必须从读取一个文件并写入另一个文件开始。追加不起作用。然后,您必须安排将从输入文件中读取的元素、属性、内容等复制到输出文件中。最后,您需要在适当的点添加额外的元素。
And is there any possibility to open the XML file in mode like RandomAccessFile, but write in it by StAX methods?
是否有可能以 RandomAccessFile 之类的模式打开 XML 文件,但通过 StAX 方法将其写入?
No. That is theoretically impossible. In order to to be able to navigate around an XML file's structure in a "random" file, you'd first need to parse the whole thing and build an index of where all the elements are. Even when you've done that, the XML is still stored as characters in a file, and random access does not allow you to insert and remove characters in the middle of a file.
不,这在理论上是不可能的。为了能够在“随机”文件中浏览 XML 文件的结构,您首先需要解析整个内容并构建所有元素所在位置的索引。即使您这样做了,XML 仍然作为字符存储在文件中,随机访问不允许您在文件中间插入和删除字符。
Maybe your best bet would be combining XSL and a SAX style parser; e.g. something along the lines of this IBM article: http://ibm.com/developerworks/xml/library/x-tiptrax
也许最好的办法是结合 XSL 和 SAX 风格的解析器;例如,与这篇 IBM 文章类似的内容:http: //ibm.com/developerworks/xml/library/x-tiptrax
回答by kristjanroosild
Maybe this StAX Read-and-Write Example in JavaEE tutorial helps: http://docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq
也许 JavaEE 教程中的这个 StAX 读写示例有帮助:http: //docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq
You can download the tutorial examples here: https://java.net/projects/javaeetutorial/downloads
您可以在此处下载教程示例:https: //java.net/projects/javaeetutorial/downloads
For quick access, the referred example is here: .htm">http://read.pudn.com/downloads79/ebook/304101/javaeetutorial5/examples/stax/readnwrite/src/readnwrite/EventProducerConsumer.java_.htm
为了快速访问,参考示例在这里:.htm">http://read.pudn.com/downloads79/ebook/304101/javaeetutorial5/examples/stax/readnwrite/src/readnwrite/EventProducerConsumer.java_.htm