java java使用StAX以通用方式获取子元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4264650/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 05:36:18  来源:igfitidea点击:

java use StAX to get children elements in a generic fashion

javadomstaxjaxp

提问by Cratylus

I am trying to use StAX (I already dislike it....)
It seems that the only way to use it is by continuous if-else conditions.
But most important it seems there is no way to associate an element with its children unless one knows beforehand the structure of the xml document being parsed.Is this correct?
I have tried the following: I have this xml in a String

我正在尝试使用 StAX(我已经不喜欢它......)
似乎使用它的唯一方法是通过连续的 if-else 条件。
但最重要的是,除非事先知道要解析的 xml 文档的结构,否则似乎无法将元素与其子元素相关联。这是正确的吗?
我尝试了以下方法:我在字符串中有这个 xml

<ns1:Root xmlns:ns1=\"http://rootNameSpace.com/\">
<ns1:A/>
<ns1:B>
        <Book xmlns=\"http://www.myNameSpace.com\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">
            <Data>
                <Author>John</Author>
                <Edition>1</Edition>
                <PubHouse>Small Publishing House</PubHouse>
                <Price>37.8</Price>
            </Data>
        </Book>
</ns1:B>
</ns1:Root>

I would like to use StAX to get the Book element, but it seems I can only write code that has hardcoded all the structure.
I.e. Use XMLEventReader and once you get Book, start looping for Data,Author etc.
Is there a generic solution on this?
I tried the following to get arround this: I tried to go from String to XMLEventReader and back to String but I can not get the exact String representation that I originally used (the namespaces are in brackets, extra colons etc).

我想使用 StAX 来获取 Book 元素,但似乎我只能编写对所有结构进行硬编码的代码。
即使用 XMLEventReader,一旦你得到书,就开始循环数据、作者等。
有没有通用的解决方案?
我尝试了以下方法来解决这个问题:我尝试从 String 到 XMLEventReader 再回到 String 但我无法获得我最初使用的确切 String 表示(命名空间在括号中,额外的冒号等)。

StringBuilder xml = new StringBuilder();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
String msg = "<ns1:Root xmlns:ns1=\"http://rootNameSpace.com/\"><ns1:A/><ns1:B><Book xmlns=\"http://www.myNameSpace.com\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><Data><Author>John</Author><Edition>1</Edition><PubHouse>Small Publishing House</PubHouse><Price>37.8</Price></Data></Book></ns1:B></ns1:Root>";
InputStream input = new ByteArrayInputStream(msg.getBytes("UTF-8"));
XMLEventReader xmlEventReader = inputFactory.createXMLEventReader(input);
while (xmlEventReader.hasNext())
{

    XMLEvent event = xmlEventReader.nextEvent();
    StringWriter sw = new StringWriter();
    event.writeAsEncodedUnicode(sw);
   xml.append(sw);

}
System.out.println(xml);

I get the following:

我得到以下信息:

<?xml version="1.0" encoding='UTF-8' standalone='no'?><['http://rootNameSpace.com/']:ns1:Root xmlns:ns1='http://rootNameSpace.com/'><['http://rootNameSpace.com/']:ns1:A></ns1:A><['http://rootNameSpace.com/']:ns1:B><['http://www.myNameSpace.com']::Book xmlns:='http://www.myNameSpace.com' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><['http://www.myNameSpace.com']::Data><['http://www.myNameSpace.com']::Author>John</Author><['http://www.myNameSpace.com']::Edition>1</Edition><['http://www.myNameSpace.com']::PubHouse>Small Publishing House</PubHouse><['http://www.myNameSpace.com']::Price>37.8</Price></Data></Book></ns1:B></ns1:Root>

Can this case be addressed via StAX or DOM is the only solution?

这种情况可以通过 StAX 解决吗? DOM 是唯一的解决方案吗?

回答by gustafc

I don't really understand what you're trying to do, but if you want the local name of the tag causing a START_ELEMENTevent, you can do it like this:

我真的不明白你想做什么,但如果你想要引起START_ELEMENT事件的标签的本地名称,你可以这样做:

if (event.getEventType() == START_ELEMENT) {
    QName qname = event.asStartElement().getName()
    System.out.println("Start of element " + qname.getLocalPart());
}

Likewise, asEndElement, asCharacters, etc provide access to other types of nodes.

同样,asEndElementasCharacters等提供对其他类型节点的访问。

Personally, I usually find that the XMLStreamReaderis handier for me in most situations, but I suppose that depends on the use case, as well as your own personal preferences. A pro tip is that the stricter the schema, the easier the data is to parse with StAX.

就我个人而言,我通常发现XMLStreamReader在大多数情况下它对我来说更方便,但我想这取决于用例以及您自己的个人喜好。专业提示是,架构越严格,使用 StAX 解析数据就越容易。

You may also want to look at JAX-Bfor automatic XML data binding.

您可能还想查看用于自动 XML 数据绑定的JAX-B

Edit:Here's a na?ve recursive descent StAX parser for the XML in the OP:

编辑:这是 OP 中 XML 的一个天真的递归下降 StAX 解析器:

@Test
public void recursiveDescentStaxParser( ) throws XMLStreamException,
        FactoryConfigurationError
{
    String msg = "<ns1:Root xmlns:ns1=\"http://rootNameSpace.com/\"><ns1:A/><ns1:B><Book xmlns=\"http://www.myNameSpace.com\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><Data><Author>John</Author><Edition>1</Edition><PubHouse>Small Publishing House</PubHouse><Price>37.8</Price></Data></Book></ns1:B></ns1:Root>";
    XMLStreamReader reader = XMLInputFactory.newFactory( )
            .createXMLStreamReader( new StringReader( msg ) );

    reader.nextTag( );
    readRoot( reader );

}

private void readRoot( XMLStreamReader reader ) throws XMLStreamException
{
    while ( reader.nextTag( ) == XMLEvent.START_ELEMENT )
    {
        QName name = reader.getName( );
        if ( "B".equals( name.getLocalPart( ) ) )
            readBooks( reader );
        else
            reader.nextTag( ); // Empty <A>

    }
}

private void readBooks( XMLStreamReader reader ) throws XMLStreamException
{
    while ( reader.nextTag( ) == XMLEvent.START_ELEMENT )
    {
        QName name = reader.getName( );
        if ( !"Book".equals( name.getLocalPart( ) ) )
            throw new XMLStreamException( name.toString( ) );
        reader.nextTag( ); // Jump to <Data>
        readBook( reader );
        reader.nextTag( ); // Jump to </B>
    }
}

private void readBook( XMLStreamReader reader ) throws XMLStreamException
{
    reader.nextTag( ); // Skip to <Author>
    System.out.println( "Author: " + reader.getElementText( ) );
    reader.nextTag( ); // Skip to <Edition>
    System.out.println( "Edition: " + reader.getElementText( ) );
    reader.nextTag( ); // Skip to <PubHouse>
    System.out.println( "Publisher: " + reader.getElementText( ) );
    reader.nextTag( ); // Skip to <Price>
    System.out.println( "Price: " + reader.getElementText( ) );
    reader.nextTag( ); // Skip to </Book>

}

Writing stuff like this doesn't only make the code a lot easier to read and reason about, but also the stack traces when errors pop up.

编写这样的东西不仅使代码更易于阅读和推理,而且还可以在错误弹出时进行堆栈跟踪。

回答by StaxMan

It sounds like you may have chosen the wrong tool here: Stax is a great API to use for efficient handling of large content. But if convenience is more important than efficiency, yes, you probably should consider a tree model (not DOM necessarily, XOM is better for example) or data binding (JAXB or XStream). Specifically, Stax like SAX are stream-based so you only see whatever is the current event or token. There are no accessors for children or parents because there is no guaranteed way to get to them, as that is not necessarily possible considering current stream position.

听起来您可能在这里选择了错误的工具:Stax 是一个很棒的 AP​​I,可用于高效处理大型内容。但是,如果方便比效率更重要,是的,您可能应该考虑树模型(不一定是 DOM,例如 XOM 更好)或数据绑定(JAXB 或 XStream)。具体来说,像 SAX 这样的 Stax 是基于流的,因此您只能看到当前的事件或令牌。没有用于孩子或父母的访问器,因为没有保证的方法可以到达他们,因为考虑到当前的流位置,这不一定是可能的。

But if performance or memory usage are a concern, you can still either consider JAXB (which is typically more efficient than tree models like DOM), or StaxMate. StaxMate is high-performance, low memory usage extension over Stax, and is bit more convenient to use. While you still need to iterate over elements in document order, its cursor approach maps more naturally with parent-then-children lookups. So it might work for your case.

但是如果性能或内存使用是一个问题,您仍然可以考虑 JAXB(它通常比 DOM 之类的树模型更有效)或StaxMate。StaxMate 是在 Stax 之上的高性能、低内存使用扩展,使用起来更加方便。虽然您仍然需要按文档顺序迭代元素,但它的游标方法通过父子查找更自然地映射。所以它可能适用于你的情况。