Java 在不知道xml文件结构的情况下解析xml文件内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21963137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing xml file contents without knowing xml file structure
提问by canadiancreed
I've been working on learning some new tech using java to parse files and for the msot part it's going well. However, I'm at a lost as to how I could parse an xml file to where the structure is not known upon receipt. Lots of examples of how to do so if you know the structure (getElementByTagName seems to be the way to go), but no dynamic options, at least not that I've found.
我一直在努力学习一些使用 java 解析文件的新技术,对于 msot 部分,它进展顺利。但是,我不知道如何将 xml 文件解析为收到时不知道结构的地方。如果您知道结构(getElementByTagName 似乎是要走的路),则有很多关于如何执行此操作的示例,但没有动态选项,至少我没有找到。
So the tl;dr version of this question, how can I parse an xml file where I cannot rely on knowing it's structure?
所以这个问题的 tl;dr 版本,我如何解析一个我不能依赖于知道它的结构的 xml 文件?
采纳答案by Jason C
Well the parsing part is easy; like helderdarocha stated in the comments, the parser only requires valid XML, it does not care about the structure. You can use Java's standard DocumentBuilder
to obtain a Document
:
解析部分很容易;就像在评论中所说的 Holderdarocha 一样,解析器只需要有效的 XML,它不关心结构。您可以使用 Java 的标准DocumentBuilder
来获取Document
:
InputStream in = new FileInputStream(...);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
(If you're parsing multiple documents, you can keep reusing the same DocumentBuilder
.)
(如果您要解析多个文档,则可以继续重复使用相同的DocumentBuilder
.)
Then you can start with the root document element and use familiar DOMmethods from there on out:
然后你可以从根文档元素开始,然后使用熟悉的DOM方法:
Element root = doc.getDocumentElement(); // perform DOM operations starting here.
As for processing it, well it really depends on what you want to do with it, but you can use the methods of Node
like getFirstChild()
and getNextSibling()
to iterate through children and process as you see fit based on structure, tags, and attributes.
至于处理它,它真的取决于你想用它做什么,但是你可以使用Node
likegetFirstChild()
和方法getNextSibling()
遍历子项,并根据结构、标签和属性,按照你认为合适的方式进行处理。
Consider the following example:
考虑以下示例:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XML {
public static void main (String[] args) throws Exception {
String xml = "<objects><circle color='red'/><circle color='green'/><rectangle>hello</rectangle><glumble/></objects>";
// parse
InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
// process
Node objects = doc.getDocumentElement();
for (Node object = objects.getFirstChild(); object != null; object = object.getNextSibling()) {
if (object instanceof Element) {
Element e = (Element)object;
if (e.getTagName().equalsIgnoreCase("circle")) {
String color = e.getAttribute("color");
System.out.println("It's a " + color + " circle!");
} else if (e.getTagName().equalsIgnoreCase("rectangle")) {
String text = e.getTextContent();
System.out.println("It's a rectangle that says \"" + text + "\".");
} else {
System.out.println("I don't know what a " + e.getTagName() + " is for.");
}
}
}
}
}
The input XML document (hard-coded for example) is:
输入 XML 文档(例如硬编码)是:
<objects>
<circle color='red'/>
<circle color='green'/>
<rectangle>hello</rectangle>
<glumble/>
</objects>
The output is:
输出是:
It's a red circle! It's a green circle! It's a rectangle that says "hello". I don't know what a glumble is for.