使用 DOM 解析 XML 文件(Java)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7901744/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 21:54:21  来源:igfitidea点击:

Parsing XML file with DOM (Java)

javaxmlparsingdom

提问by LordDoskias

I want to parse the following url: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801

我想解析以下网址:http: //eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db= nucleotide&id= 224589801

As a result I came up with the following method:

结果我想出了以下方法:

public void parseXml2(String URL) {
    DOMParser parser = new DOMParser();

    try {
        parser.parse(new InputSource(new URL(URL).openStream()));
        Document doc = parser.getDocument();

        NodeList nodeList = doc.getElementsByTagName("Item");
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node n = nodeList.item(i);
            Node actualNode = n.getFirstChild();
            if (actualNode != null) {
                System.out.println(actualNode.getNodeValue());
            }
        }

    } catch (SAXException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    }
}

With this method I can take the values of the Item nodes but I can't take any of their attributes. I tried experimenting with getAttribute() with NamedNodeMap but still to no avail.

使用这种方法,我可以获取 Item 节点的值,但不能获取它们的任何属性。我尝试使用 NamedNodeMap 尝试使用 getAttribute() 但仍然无济于事。

  1. Why do I have to do n.getFirstChild().getNodeValue();to get the actual value? n.getNodeValue()returns just null? Isn't this counter-intuitive - obviously in my case node's doesn't have subnodes?

  2. Is there some more robust and widely accepted way of parsing XML files using DOM? My files aren't gonna be big 15-20 lines at most, so SAX isn't necessary (or is it?)

  1. 为什么我必须这样做n.getFirstChild().getNodeValue();才能获得实际值?n.getNodeValue()只返回空?这不是违反直觉的 - 显然在我的情况下节点没有子节点?

  2. 是否有一些更健壮且被广泛接受的使用 DOM 解析 XML 文件的方法?我的文件最多不会有 15-20 行,所以 SAX 不是必需的(或者是吗?)

回答by gigadot

  1. Text value that is surrounded by XML tag are also considered as Node in DOM. That's why you have to get the text Node before getting the value. If you try to count the number of node in an <Item>, you will see that whenever there is a text, there is a node.

  2. XOMhas more intuitive interface but it doesn't have org.w3c.dom.*interface.

  1. 被 XML 标记包围的文本值在 DOM 中也被视为节点。这就是为什么您必须在获取值之前获取文本节点。如果您尝试计算 中的节点数<Item>,您会发现只要有文本,就会有一个节点。

  2. XOM有更直观的界面,但它没有org.w3c.dom.*界面。

If you want to use the build-in parser, you should look at http://www.java-samples.com/showtutorial.php?tutorialid=152

如果你想使用内置的解析器,你应该看看http://www.java-samples.com/showtutorial.php?tutorialid=152

The DOMParseryou tried to use are propriety and it's not portable.

DOMParser你试图使用的正当性,它不便于携带。

回答by Wivani

import java.io.IOException;
import java.net.URL;
import org.apache.xerces.parsers.DOMParser;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class XMLParser {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        parseXml2("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801");
    }

    public static void parseXml2(String URL) {
        DOMParser parser = new DOMParser();

        try {
            parser.parse(new InputSource(new URL(URL).openStream()));
            Document doc = parser.getDocument();

            NodeList nodeList = doc.getElementsByTagName("Item");
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.print("Item "+(i+1));
                Node n = nodeList.item(i);
                NamedNodeMap m = n.getAttributes();
                System.out.print(" Name: "+m.getNamedItem("Name").getTextContent());
                System.out.print(" Type: "+m.getNamedItem("Type").getTextContent());
                Node actualNode = n.getFirstChild();
                if (actualNode != null) {
                    System.out.println(" "+actualNode.getNodeValue());
                } else {
                    System.out.println(" ");                    
                }
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

Completed the sample code and added a few lines to get the attributes.

完成示例代码并添加几行以获取属性。

This should get you started, although I feel that you need to get yourself up to date with the basic notions of DOM. Thissite (and many others) can help you with that. Most importantly is understanding the different kinds of nodes there are.

这应该会让你开始,虽然我觉得你需要让自己了解 DOM 的基本概念。这个站点(以及许多其他站点)可以帮助您解决这个问题。最重要的是了解存在的不同类型的节点。

回答by Maurice Perry

Text inside xml elements are in text nodes because subelements can be mixed with text. For instance:

xml 元素中的文本位于文本节点中,因为子元素可以与文本混合。例如:

...
<A>blah<B/>blah</A>
...

Element A has three children: a text node, element B, another text node.

元素 A 有三个子节点:一个文本节点、元素 B、另一个文本节点。