Java 为什么我得到额外的文本节点作为根节点的子节点?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20259742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 00:38:33  来源:igfitidea点击:

Why am I getting extra text nodes as child nodes of root node?

javaxmldom

提问by Vikas Mangal

I want to print the child elements of the root node. This is my XML file.

我想打印根节点的子元素。这是我的 XML 文件。

<?xml version="1.0"?>
<!-- Comment-->
<company>
   <staff id="1001">
       <firstname>yong</firstname>
       <lastname>mook kim</lastname>
       <nickname>mkyong</nickname>
       <salary>100000</salary>
   </staff>
   <staff id="2001">
       <firstname>low</firstname>
       <lastname>yin fong</lastname>
       <nickname>fong fong</nickname>
       <salary>200000</salary>
   </staff>
</company>

According to my understanding, root node is 'company' and its child nodes must be 'staff' and 'staff' (as there are 'staff' nodes 2 times). But when I am trying to get them through my java code I am getting 5 child nodes. Where are the 3 extra text nodes coming from?

根据我的理解,根节点是'company',它的子节点必须是'staff'和'staff'(因为有2次'staff'节点)。但是当我试图通过我的 Java 代码获取它们时,我得到了 5 个子节点。3 个额外的文本节点来自哪里?

Java Code:

Java代码:

package com.training.xml;

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ReadingXML {

public static void main(String[] args) {
    try {

        File file = new File("D:\TestFile.xml");

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(file);
        doc.getDocumentElement().normalize();

        System.out.println("root element: " + doc.getDocumentElement().getNodeName());

        Node rootNode = doc.getDocumentElement(); 
        System.out.println("root: " + rootNode.getNodeName());

        NodeList nList = rootNode.getChildNodes(); 

        for(int i = 0; i < nList.getLength(); i++) {
            System.out.println("node name: " + nList.item(i).getNodeName() );
        }           
    } catch(Exception e) {
        e.printStackTrace();
    }
}
}

OUTPUT:

输出:

root element: company
root: company
node name: #text
node name: staff
node name: #text
node name: staff
node name: #text

Why the three text nodes are coming over here?

为什么三个文本节点都过来了?

采纳答案by Jon Skeet

Why the three text nodes are coming over here ?

为什么三个文本节点会从这里过来?

They're the whitespace between the child elements. If you only want the child elements, you should just ignore nodes of other types:

它们是子元素之间的空白。如果你只想要子元素,你应该忽略其他类型的节点:

for (int i = 0;i < nList.getLength(); i++) {
    Node node = nList.item(i);
    if (node.getNodeType() == Node.ELEMENT_NODE) {
        System.out.println("node name: " + node.getNodeName());
    }
}

Or you could change your document to not have that whitespace.

或者您可以更改您的文档,使其不包含该空格。

Or you could use a different XML API which allows you to easily ask for just elements. (The DOM API is a pain in various ways.)

或者您可以使用不同的 XML API,它允许您轻松地仅请求元素。(DOM API 在各种方面都令人痛苦。)

If you only want to ignore element content whitespace, you can use Text.isElementContentWhitespace.

如果只想忽略元素内容的空格,可以使用Text.isElementContentWhitespace.