java 如何从 JDOM 获取节点内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7910474/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 21:57:53  来源:igfitidea点击:

How to get node contents from JDOM

javaxmlxml-parsingjdom

提问by jeph perro

I'm writing an application in java using import org.jdom.*;

我正在使用 import org.jdom.* 在 java 中编写一个应用程序;

My XML is valid,but sometimes it contains HTML tags. For example, something like this:

我的 XML 是有效的,但有时它包含 HTML 标签。例如,这样的事情:

  <program-title>Anatomy &amp; Physiology</program-title>
  <overview>
       <content>
              For more info click <a href="page.html">here</a>
              <p>Learn more about the human body.  Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>
       </content>
  </overview>
  <key-information>
     <category>Health &amp; Human Services</category>

So my problem is with the < p > tags inside the overview.content node.

所以我的问题是overview.content 节点中的< p > 标签。

I was hoping that this code would work :

我希望这段代码可以工作:

        Element overview = sds.getChild("overview");
        Element content = overview.getChild("content");

        System.out.println(content.getText());

but it returns blank.

但它返回空白。

How do I return all the text ( nested tags and all ) from the overview.content node ?

如何从overview.content 节点返回所有文本(嵌套标签和所有)?

Thanks

谢谢

回答by Prashant Bhate

content.getText()gives immediate text which is only useful fine with the leaf elements with text content.

content.getText()给出直接文本,这仅对带有文本内容的叶元素有用。

Trick is to use org.jdom.output.XMLOutputter( with text mode CompactFormat)

技巧是使用org.jdom.output.XMLOutputter(使用文本模式CompactFormat

public static void main(String[] args) throws Exception {
    SAXBuilder builder = new SAXBuilder();
    String xmlFileName = "a.xml";
    Document doc = builder.build(xmlFileName);

    Element root = doc.getRootElement();
    Element overview = root.getChild("overview");
    Element content = overview.getChild("content");

    XMLOutputter outp = new XMLOutputter();

    outp.setFormat(Format.getCompactFormat());
    //outp.setFormat(Format.getRawFormat());
    //outp.setFormat(Format.getPrettyFormat());
    //outp.getFormat().setTextMode(Format.TextMode.PRESERVE);

    StringWriter sw = new StringWriter();
    outp.output(content.getContent(), sw);
    StringBuffer sb = sw.getBuffer();
    System.out.println(sb.toString());
}

Output

输出

For more info click<a href="page.html">here</a><p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>

Do explore other formattingoptions and modify above code to your need.

请探索其他格式选项并根据需要修改上面的代码。

"Class to encapsulate XMLOutputter format options. Typical users can use the standard format configurations obtained by getRawFormat() (no whitespace changes), getPrettyFormat() (whitespace beautification), and getCompactFormat() (whitespace normalization). "

"用于封装 XMLOutputter 格式选项的类。典型用户可以使用 getRawFormat()(无空白更改)、getPrettyFormat()(空白美化)和 getCompactFormat()(空白规范化)获得的标准格式配置。"

回答by yankee

Well, maybe that's what you need:

好吧,也许这就是您所需要的:

import java.io.StringReader;

import org.custommonkey.xmlunit.XMLTestCase;
import org.custommonkey.xmlunit.XMLUnit;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.testng.annotations.Test;
import org.xml.sax.InputSource;

public class HowToGetNodeContentsJDOM extends XMLTestCase
{
    private static final String XML = "<root>\n" + 
            "  <program-title>Anatomy &amp; Physiology</program-title>\n" + 
            "  <overview>\n" + 
            "       <content>\n" + 
            "              For more info click <a href=\"page.html\">here</a>\n" + 
            "              <p>Learn more about the human body.  Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>\n" + 
            "       </content>\n" + 
            "  </overview>\n" + 
            "  <key-information>\n" + 
            "     <category>Health &amp; Human Services</category>\n" + 
            "  </key-information>\n" + 
            "</root>";
    private static final String EXPECTED = "For more info click <a href=\"page.html\">here</a>\n" + 
            "<p>Learn more about the human body.  Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>";

    @Test
    public void test() throws Exception
    {
        XMLUnit.setIgnoreWhitespace(true);
        Document document = new SAXBuilder().build(new InputSource(new StringReader(XML)));
        List<Content> content = document.getRootElement().getChild("overview").getChild("content").getContent();
        String out = new XMLOutputter().outputString(content);
        assertXMLEqual("<root>" + EXPECTED + "</root>", "<root>" + out + "</root>");
    }
}

Output:

输出:

PASSED: test on instance null(HowToGetNodeContentsJDOM)

===============================================
    Default test
    Tests run: 1, Failures: 0, Skips: 0
===============================================

I am using JDom with generics: http://www.junlu.com/list/25/883674.html

我正在使用带有泛型的 JDom:http://www.junlu.com/list/25/883674.html

Edit: Actually that's not that much different from Prashant Bhate's answer. Maybe you need to tell us what you are missing...

编辑:实际上,这与 Prashant Bhate 的回答并没有太大不同。也许你需要告诉我们你错过了什么......

回答by G_H

You could try using method getValue()for the closest approximation, but what this does is concatenate all text within the element and descendants together. This won't give you the <p>tag in any form. If that tag is in your XML like you've shown, it has become part of the XML markup. It'd need to be included as &lt;p&gt;or embedded in a CDATA section to be treated as text.

您可以尝试使用方法 getValue()进行最接近的近似,但这样做是将元素和后代中的所有文本连接在一起。这不会<p>以任何形式为您提供标签。如果该标记如您所示在您的 XML 中,则它已成为 XML 标记的一部分。它需要作为&lt;p&gt;或嵌入在 CDATA 部分中才能被视为文本。

Alternatively, if you know all elements that either may or may not appear in your XML, you could apply an XSLT transformation that turns stuff which isn't intended as markup into plain text.

或者,如果您知道可能会或可能不会出现在您的 XML 中的所有元素,您可以应用 XSLT 转换,将不打算作为标记的内容转换为纯文本。

回答by aoi222

If you're also generating the XML file you should be able to encapsulate your html data in <![CDATA[]]>so that it isn't parsed by the XML parser.

如果您还生成 XML 文件,您应该能够将 html 数据封装在其中,<![CDATA[]]>以便 XML 解析器不会对其进行解析。

回答by lujop

If you want to output the content of some JSOM node just use

如果你想输出一些 JSOM 节点的内容,只需使用

System.out.println(new XMLOutputter().outputString(node))

回答by Guillaume Serre

Not particularly pretty but works fine (using JDOM API):

不是特别漂亮但工作正常(使用 JDOM API):

public static String getRawText(Element element) {
    if (element.getContent().size() == 0) {
        return "";
    }

    StringBuffer text = new StringBuffer();
    for (int i = 0; i < element.getContent().size(); i++) {
        final Object obj = element.getContent().get(i);
        if (obj instanceof Text) {
            text.append( ((Text) obj).getText() );
        } else if (obj instanceof Element) {
            Element e = (Element) obj;
            text.append( "<" ).append( e.getName() );
            // dump all attributes
            for (Attribute attribute : (List<Attribute>)e.getAttributes()) {
                text.append(" ").append(attribute.getName()).append("=\"").append(attribute.getValue()).append("\"");
            }
            text.append(">");
            text.append( getRawText( e )).append("</").append(e.getName()).append(">");
        }
    }
    return text.toString();
}

Prashant Bhate's solution is nicer though!

不过,Prashant Bhate 的解决方案更好!

回答by duffymo

The problem is that the <content>node doesn't have a text child; it has a <p>child that happens to contain text.

问题是<content>节点没有文本子节点;它有一个<p>恰好包含文本的孩子。

Try this:

试试这个:

Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
Element p = content.getChild("p");
System.out.println(p.getText());

If you want all the immediate child nodes, call p.getChildren(). If you want to get ALL the child nodes, you'll have to call it recursively.

如果您想要所有直接子节点,请调用p.getChildren(). 如果要获取所有子节点,则必须递归调用它。