使用 Java DOM 获取 XML 节点文本值

Question

提问by Emilio

I can't fetch text value with Node.getNodeValue(), Node.getFirstChild().getNodeValue()or with Node.getTextContent().

我无法获取文本值Node.getNodeValue()，Node.getFirstChild().getNodeValue()或者用Node.getTextContent()。

My XML is like

我的 XML 就像

<add job="351">
    <tag>foobar</tag>
    <tag>foobar2</tag>
</add>

And I'm trying to get tagvalue (non-text element fetching works fine). My Java code sounds like

我正在尝试获取标签值（非文本元素获取工作正常）。我的 Java 代码听起来像

Document doc = db.parse(new File(args[0]));
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();   
Node an,an2;

for (int i=0; i < nl.getLength(); i++) {
    an = nl.item(i);

    if(an.getNodeType()==Node.ELEMENT_NODE) {
        NodeList nl2 = an.getChildNodes();

        for(int i2=0; i2<nl2.getLength(); i2++) {
            an2 = nl2.item(i2);

            // DEBUG PRINTS
            System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");

            if(an2.hasChildNodes())
                System.out.println(an2.getFirstChild().getTextContent());

            if(an2.hasChildNodes())
                System.out.println(an2.getFirstChild().getNodeValue());

            System.out.println(an2.getTextContent());
            System.out.println(an2.getNodeValue());
        }
    }
}

It prints out

它打印出来

tag type (1): 
tag1
tag1
tag1
null
#text type (3):
_blank line_
_blank line_
...

Thanks for the help.

谢谢您的帮助。

Answer 1

采纳答案by jsight

I'd print out the result of an2.getNodeName()as well for debugging purposes. My guess is that your tree crawling code isn't crawling to the nodes that you think it is. That suspicion is enhanced by the lack of checking for node names in your code.

an2.getNodeName()出于调试目的，我也会打印出结果。我的猜测是您的树爬行代码没有爬到您认为的节点。由于没有检查代码中的节点名称，这种怀疑更加强烈。

Other than that, the javadoc for Node defines "getNodeValue()" to return null for Nodes of type Element. Therefore, you really should be using getTextContent(). I'm not sure why that wouldn't give you the text that you want.

除此之外，Node 的 javadoc 定义了“ getNodeValue()”来为 Element 类型的节点返回 null。因此，您确实应该使用 getTextContent()。我不知道为什么那不会给你你想要的文字。

Perhaps iterate the children of your tag node and see what types are there?

也许迭代你的标签节点的子节点，看看有哪些类型？

Tried this code and it works for me:

试过这段代码，它对我有用：

String xml = "<add job=\"351\">\n" +
             "    <tag>foobar</tag>\n" +
             "    <tag>foobar2</tag>\n" +
             "</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;

for (int i=0; i < nl.getLength(); i++) {
    an = nl.item(i);
    if(an.getNodeType()==Node.ELEMENT_NODE) {
        NodeList nl2 = an.getChildNodes();

        for(int i2=0; i2<nl2.getLength(); i2++) {
            an2 = nl2.item(i2);
            // DEBUG PRINTS
            System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
            if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getTextContent());
            if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getNodeValue());
            System.out.println(an2.getTextContent());
            System.out.println(an2.getNodeValue());
        }
    }
}

Output was:

输出是：

#text: type (3): foobar foobar
#text: type (3): foobar2 foobar2

Answer 2

回答by toolkit

If your XML goes quite deep, you might want to consider using XPath, which comes with your JRE, so you can access the contents far more easily using:

如果您的 XML 非常深入，您可能需要考虑使用 JRE 附带的 XPath，以便您可以使用以下方法更轻松地访问内容：

String text = xp.evaluate("//add[@job='351']/tag[position()=1]/text()", 
    document.getDocumentElement());

Full example:

完整示例：

import static org.junit.Assert.assertEquals;
import java.io.StringReader;    
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;    
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;

public class XPathTest {

    private Document document;

    @Before
    public void setup() throws Exception {
        String xml = "<add job=\"351\"><tag>foobar</tag><tag>foobar2</tag></add>";
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        document = db.parse(new InputSource(new StringReader(xml)));
    }

    @Test
    public void testXPath() throws Exception {
        XPathFactory xpf = XPathFactory.newInstance();
        XPath xp = xpf.newXPath();
        String text = xp.evaluate("//add[@job='351']/tag[position()=1]/text()",
                document.getDocumentElement());
        assertEquals("foobar", text);
    }
}

Answer 3

回答by Zeus

I use a very old java. Jdk 1.4.08 and I had the same issue. The Nodeclass for me did not had the getTextContent()method. I had to use Node.getFirstChild().getNodeValue()instead of Node.getNodeValue()to get the value of the node. This fixed for me.

我使用一个非常古老的java。Jdk 1.4.08 和我有同样的问题。Node我的课没有这个getTextContent()方法。我不得不使用Node.getFirstChild().getNodeValue()而不是Node.getNodeValue()获取节点的值。这对我来说是固定的。

Answer 4

回答by vtd-xml-author

If you are open to vtd-xml, which excels at both performance and memory efficiency, below is the code to do what you are looking for...in both XPath and manual navigation... the overall code is much concise and easier to understand ...

如果您对vtd-xml 持开放态度，它在性能和内存效率方面都表现出色，下面是执行您要查找的内容的代码...在 XPath 和手动导航中...整体代码更加简洁且易于使用理解 ...

import com.ximpleware.*;
public class queryText {
    public static void main(String[] s) throws VTDException{
        VTDGen vg = new VTDGen();
        if (!vg.parseFile("input.xml", true))
            return;
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        // first manually navigate
        if(vn.toElement(VTDNav.FC,"tag")){
            int i= vn.getText();
            if (i!=-1){
                System.out.println("text ===>"+vn.toString(i));
            }
            if (vn.toElement(VTDNav.NS,"tag")){
                i=vn.getText();
                System.out.println("text ===>"+vn.toString(i));
            }
        }

        // second version use XPath
        ap.selectXPath("/add/tag/text()");
        int i=0;
        while((i=ap.evalXPath())!= -1){
            System.out.println("text node ====>"+vn.toString(i));
        }
    }
}

使用 Java DOM 获取 XML 节点文本值

提问by Emilio

采纳答案by jsight

回答by toolkit

回答by Zeus

回答by vtd-xml-author

相关推荐

最近更新

标签

使用 Java DOM 获取 XML 节点文本值

提问by Emilio

采纳答案by jsight

回答by toolkit

回答by Zeus

回答by vtd-xml-author

相关推荐

Java 如何在控制台中打印具有相同类名（人类可读）的元素/值？

如何使用 BigInteger 类在 Java 中实现无符号 64 位 int？

Java JPA Hibernate Persistence 异常 [PersistenceUnit: default] 无法构建 Hibernate SessionFactory

Java 字符串中子字符串的出现次数

相关推荐

最近更新

标签