java stax - 获取 xml 节点作为字符串

Question

提问by Jason

xml looks like so:

xml 看起来像这样：

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

I'm using stax to process one "<statement>" at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.

我正在使用 stax 一次处理一个 " <statement>" 并且我得到了它。我需要将整个语句节点作为字符串获取，以便我可以创建“123.xml”和“456.xml”，或者甚至将其加载到按帐户索引的数据库表中。

using this approach: http://www.devx.com/Java/Article/30298/1954

使用这种方法：http: //www.devx.com/Java/Article/30298/1954

I'm looking to do something like this:

我正在做这样的事情：

String statementXml = staxXmlReader.getNodeByName("statement");

//load statementXml into database

Answer 1

采纳答案by javamonkey79

Why not just use xpath for this?

为什么不直接使用 xpath 呢？

You could have a fairly simple xpath to get all 'statement' nodes.

您可以使用一个相当简单的 xpath 来获取所有“语句”节点。

Like so:

像这样：

//statement

EDIT #1: If possible, take a look at dom4j. You could read the String and get all 'statement' nodes fairly simply.

编辑#1：如果可能，请查看dom4j。您可以读取字符串并相当简单地获取所有“语句”节点。

EDIT #2: Using dom4j, this is how you would do it: (from their cookbook)

编辑 #2：使用 dom4j，这就是你的做法：（来自他们的食谱）

String text = "your xml here";
Document document = DocumentHelper.parseText(text);

public void bar(Document document) {
   List list = document.selectNodes( "//statement" );
   // loop through node data
}

Answer 2

回答by t0r0X

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:

我有一个类似的任务，虽然最初的问题已经超过一年了，但我找不到令人满意的答案。到目前为止，最有趣的答案是 Blaise Doughan 的答案，但我无法在我期望的 XML 上运行它（也许底层解析器的某些参数可能会改变它？）。这里的 XML，非常简单：

<many-many-tags>
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
</many-many-tags>

My solution:

我的解决方案：

public static String readElementBody(XMLEventReader eventReader)
    throws XMLStreamException {
    StringWriter buf = new StringWriter(1024);

    int depth = 0;
    while (eventReader.hasNext()) {
        // peek event
        XMLEvent xmlEvent = eventReader.peek();

        if (xmlEvent.isStartElement()) {
            ++depth;
        }
        else if (xmlEvent.isEndElement()) {
            --depth;

            // reached END_ELEMENT tag?
            // break loop, leave event in stream
            if (depth < 0)
                break;
        }

        // consume event
        xmlEvent = eventReader.nextEvent();

        // print out event
        xmlEvent.writeAsEncodedUnicode(buf);
    }

    return buf.getBuffer().toString();
}

Usage example:

用法示例：

XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
    XMLEvent xmlEvent = eventReader.nextEvent();
    if (xmlEvent.isStartElement()) {
        StartElement elem = xmlEvent.asStartElement();
        String name = elem.getName().getLocalPart();

        if ("DESCRIPTION".equals(name)) {
            String xmlFragment = readElementBody(eventReader);
            // do something with it...
            System.out.println("'" + fragment + "'");
        }
    }
    else if (xmlEvent.isEndElement()) {
        // ...
    }
}

Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:

请注意，提取的 XML 片段将包含完整的提取正文内容，包括空格和注释。为代码简洁起见，按需过滤那些或使缓冲区大小可参数化已被省略：

'
    <description>
        ...
        <p>Lorem ipsum...</p>
        Devils inside...
        ...
    </description>
    '

Answer 3

回答by bdoughan

You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.

您可以为此使用 StAX。您只需要将 XMLStreamReader 推进到语句的开始元素。检查帐户属性以获取文件名。然后使用 javax.xml.transform API 将 StAXSource 转换为包装文件的 StreamResult。这将推进 XMLStreamReader，然后重复此过程。

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

}

Answer 4

回答by StaxMan

Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?

Stax 是一个低级访问 API，它没有查找或递归访问内容的方法。但你究竟想做什么？你为什么要考虑 Stax？

Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance. Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time). This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.

除了使用与 XPath 配合良好的树模型（DOM、XOM、JDOM、Dom4j）之外，处理数据时的最佳选择通常是像 JAXB 这样的数据绑定库。有了它，您可以传递 Stax 或 SAX 读取器，并要求它将 xml 数据绑定到 Java bean 中，而不是弄乱 xml 进程 Java 对象。这通常更方便，而且通常性能相当。处理较大文件的唯一技巧是您不想一次绑定整个内容，而是绑定每个子树（在您的情况下，一次一个“语句”）。最简单的方法是迭代 Stax XmlStreamReader，然后使用 JAXB 进行绑定。

Answer 5

回答by Jason

I've been googling and this seems painfully difficult.

我一直在谷歌搜索，这似乎很难。

given my xml I think it might just be simpler to:

鉴于我的 xml，我认为它可能更简单：

StringBuilder buffer = new StringBuilder();
for each line in file {
   buffer.append(line)
   if(line.equals(STMT_END_TAG)){
      parse(buffer.toString())
      buffer.delete(0,buffer.length)
   }
 }

 private void parse(String statement){
    //saxParser.parse( new InputSource( new StringReader( xmlText ) );
    // do stuff
    // save string
 }

java stax - 获取 xml 节点作为字符串

提问by Jason

采纳答案by javamonkey79

回答by t0r0X

回答by bdoughan

回答by StaxMan

回答by Jason

相关推荐

最近更新

标签

java stax - 获取 xml 节点作为字符串

提问by Jason

采纳答案by javamonkey79

回答by t0r0X

回答by bdoughan

回答by StaxMan

回答by Jason

相关推荐

在 Java 中通过 HTTP 发送 SOAP 消息

java 如何从cookies中删除信息？

java 什么是tomcat中的空会话路径？

java 如何在java中读取字符串（文件）到数组

相关推荐

最近更新

标签