使用 DOM 解析 XML 文件（Java）

Question

提问by LordDoskias

I want to parse the following url: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801

我想解析以下网址：http: //eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db= nucleotide&id= 224589801

As a result I came up with the following method:

结果我想出了以下方法：

public void parseXml2(String URL) {
    DOMParser parser = new DOMParser();

    try {
        parser.parse(new InputSource(new URL(URL).openStream()));
        Document doc = parser.getDocument();

        NodeList nodeList = doc.getElementsByTagName("Item");
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node n = nodeList.item(i);
            Node actualNode = n.getFirstChild();
            if (actualNode != null) {
                System.out.println(actualNode.getNodeValue());
            }
        }

    } catch (SAXException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    }
}

With this method I can take the values of the Item nodes but I can't take any of their attributes. I tried experimenting with getAttribute() with NamedNodeMap but still to no avail.

使用这种方法，我可以获取 Item 节点的值，但不能获取它们的任何属性。我尝试使用 NamedNodeMap 尝试使用 getAttribute() 但仍然无济于事。

Why do I have to do n.getFirstChild().getNodeValue();to get the actual value? n.getNodeValue()returns just null? Isn't this counter-intuitive - obviously in my case node's doesn't have subnodes?
Is there some more robust and widely accepted way of parsing XML files using DOM? My files aren't gonna be big 15-20 lines at most, so SAX isn't necessary (or is it?)

为什么我必须这样做n.getFirstChild().getNodeValue();才能获得实际值？n.getNodeValue()只返回空？这不是违反直觉的 - 显然在我的情况下节点没有子节点？
是否有一些更健壮且被广泛接受的使用 DOM 解析 XML 文件的方法？我的文件最多不会有 15-20 行，所以 SAX 不是必需的（或者是吗？）

Answer 1

回答by gigadot

Text value that is surrounded by XML tag are also considered as Node in DOM. That's why you have to get the text Node before getting the value. If you try to count the number of node in an <Item>, you will see that whenever there is a text, there is a node.
XOMhas more intuitive interface but it doesn't have org.w3c.dom.*interface.

被 XML 标记包围的文本值在 DOM 中也被视为节点。这就是为什么您必须在获取值之前获取文本节点。如果您尝试计算中的节点数<Item>，您会发现只要有文本，就会有一个节点。
XOM有更直观的界面，但它没有org.w3c.dom.*界面。

If you want to use the build-in parser, you should look at http://www.java-samples.com/showtutorial.php?tutorialid=152

如果你想使用内置的解析器，你应该看看http://www.java-samples.com/showtutorial.php?tutorialid=152

The DOMParseryou tried to use are propriety and it's not portable.

在DOMParser你试图使用的正当性，它不便于携带。

Answer 2

回答by Wivani

import java.io.IOException;
import java.net.URL;
import org.apache.xerces.parsers.DOMParser;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class XMLParser {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        parseXml2("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801");
    }

    public static void parseXml2(String URL) {
        DOMParser parser = new DOMParser();

        try {
            parser.parse(new InputSource(new URL(URL).openStream()));
            Document doc = parser.getDocument();

            NodeList nodeList = doc.getElementsByTagName("Item");
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.print("Item "+(i+1));
                Node n = nodeList.item(i);
                NamedNodeMap m = n.getAttributes();
                System.out.print(" Name: "+m.getNamedItem("Name").getTextContent());
                System.out.print(" Type: "+m.getNamedItem("Type").getTextContent());
                Node actualNode = n.getFirstChild();
                if (actualNode != null) {
                    System.out.println(" "+actualNode.getNodeValue());
                } else {
                    System.out.println(" ");                    
                }
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

Completed the sample code and added a few lines to get the attributes.

完成示例代码并添加几行以获取属性。

This should get you started, although I feel that you need to get yourself up to date with the basic notions of DOM. Thissite (and many others) can help you with that. Most importantly is understanding the different kinds of nodes there are.

这应该会让你开始，虽然我觉得你需要让自己了解 DOM 的基本概念。这个站点（以及许多其他站点）可以帮助您解决这个问题。最重要的是了解存在的不同类型的节点。

Answer 3

回答by Maurice Perry

Text inside xml elements are in text nodes because subelements can be mixed with text. For instance:

xml 元素中的文本位于文本节点中，因为子元素可以与文本混合。例如：

...
<A>blah<B/>blah</A>
...

Element A has three children: a text node, element B, another text node.

元素 A 有三个子节点：一个文本节点、元素 B、另一个文本节点。

使用 DOM 解析 XML 文件（Java）

提问by LordDoskias

回答by gigadot

回答by Wivani

回答by Maurice Perry

相关推荐

最近更新

标签

使用 DOM 解析 XML 文件（Java）

提问by LordDoskias

回答by gigadot

回答by Wivani

回答by Maurice Perry

相关推荐

java 如何使用 MockHttpServletRequest 对文件上传进行单元测试？

java JavaFX CSS 属性和选择器的最佳参考

java CXF RESTful 客户端 - 如何信任所有证书？

java 排序链表实现

相关推荐

最近更新

标签