使用 Java DOM 处理空节点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4010726/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 04:21:56  来源:igfitidea点击:

Handling Empty Nodes Using Java DOM

javaxmlparsingdom

提问by MysteryMoose

I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):

我有一个关于 XML、Java 对 DOM 的使用和空节点的问题。我目前正在做一个项目,在该项目中我使用抽象机器的 XML 描述符文件(用于文本解析)并用它们解析一系列输入字符串。这些抽象机器的实际构建和解释已经全部完成并且工作正常,但是我遇到了一个相当有趣的 XML 需求。具体来说,我需要能够将空 InputString 节点转换为空字符串 ("") 并仍然执行我的解析例程。然而,当我试图从我的 XML 树中提取这个空白节点时,就会出现这个问题。这会导致空指针异常,然后通常会发生不好的事情。这是有问题的 XML 片段(注意第一个元素是空的):

    <InputStringList>
        <InputString></InputString>
        <InputString>000</InputString>
        <InputString>111</InputString>
        <InputString>01001</InputString>
        <InputString>1011011</InputString>
        <InputString>1011000</InputString>
        <InputString>01010</InputString>
        <InputString>1010101110</InputString>
    </InputStringList>

I extract my strings from the list using:

我使用以下方法从列表中提取我的字符串:

//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {

    //Add input string to list
    if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
        arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());

    } else {
        arrInputStrings.add("");

    }
}

How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.

我该如何处理这个空箱?我找到了很多关于删除空白文本节点的信息,但实际上我仍然必须将空白节点解析为空字符串。理想情况下,我想避免使用特殊字符来表示空白字符串。

Thank you in advance for your time.

提前感谢您的时间。

回答by bobince

if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {

nodeValueshouldn't be null; it would be firstChilditself that might be null and should be checked for:

nodeValue不应该为空;它firstChild本身可能为空,应该检查:

Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());

However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.

但是请注意,这对于只有一个文本节点的内容仍然很敏感。如果您的元素中包含另一个元素,或者某些文本和 CDATA 部分,则仅获取第一个子元素的值不足以阅读整个文本。

What you really want is the textContentpropertyfrom DOM Level 3 Core, which will give you all the text inside the element, however contained.

您真正想要的是来自 DOM Level 3 Core的textContent属性,它将为您提供元素内的所有文本,无论是否包含。

arrInputStrings.add(xmlNodeList.item(j).getTextContent());

This is available in Java 1.5onwards.

这在Java 1.5之后可用。

回答by Lukas Eder

You could use a library like jOOXto generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:

您可以使用像jOOX这样的库来简化标准 DOM 操作。使用 jOOX,你会得到这样的字符串列表:

List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
                                    .find(XML_INPUT_STRING)
                                    .texts();