java 使用 SAX 解析常见的 XML 元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3405702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 01:41:34  来源:igfitidea点击:

Using SAX to parse common XML elements

javaxmlsax

提问by Dave

I'm currently using SAX (Java) to parse a a handful of different XML documents, with each document representing different data and having slightly different structures. For this reason, each XML document is handled by a different SAX class (subclassing DefaultHandler).

我目前正在使用 SAX (Java) 来解析一些不同的 XML 文档,每个文档代表不同的数据并且结构略有不同。因此,每个 XML 文档都由不同的 SAX 类(子类化DefaultHandler)处理。

However, there are some XML structures that can appear in all these different documents. Ideally, I'd like to tell the parser "Hey, when you reach a complex_nodeelement, just use ComplexNodeHandlerto read it, and give me back the result. If you reach a some_other_node, use OtherNodeHandlerto read it and give me back that result".

但是,有一些 XML 结构可以出现在所有这些不同的文档中。理想情况下,我想告诉解析器“嘿,当您到达一个complex_node元素时,只需使用ComplexNodeHandler它来读取它,然后将结果返回给我。如果您到达some_other_node,请使用OtherNodeHandler读取它并将该结果返回给我”。

However, I can't see an obvious way to do this.

但是,我看不到明显的方法来做到这一点。

Should I simply just make a monolithic handler class that can read all the different documents I have (and eradicate duplication of code), or is there a smarter way to handle this?

我应该简单地制作一个可以读取我拥有的所有不同文档(并消除重复代码)的整体处理程序类,还是有更聪明的方法来处理这个问题?

回答by bdoughan

Below is an answer I made to a similar question (Skipping nodes with sax). It demonstrates how to swap content handlers on an XMLReader.

下面是我对类似问题的回答(使用 sax 跳过节点)。它演示了如何在 XMLReader 上交换内容处理程序。

In this example the swapped in ContentHandler simply ignores all events until it gives up control, but you could adapt the concept easily.

在这个例子中, ContentHandler 中的交换只是忽略所有事件,直到它放弃控制,但你可以很容易地适应这个概念。



You could do something like the following:

您可以执行以下操作:

import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
import org.xml.sax.XMLReader; 

public class Demo { 

    public static void main(String[] args) throws Exception { 
        SAXParserFactory spf = SAXParserFactory.newInstance(); 
        SAXParser sp = spf.newSAXParser(); 
        XMLReader xr = sp.getXMLReader(); 
        xr.setContentHandler(new MyContentHandler(xr)); 
        xr.parse("input.xml"); 
    } 
} 

MyContentHandler

我的内容处理程序

This class is responsible for processing your XML document. When you hit a node you want to ignore you can swap in the IgnoringContentHandler which will swallow all events for that node.

此类负责处理您的 XML 文档。当您点击要忽略的节点时,您可以交换 IgnoringContentHandler,它将吞下该节点的所有事件。

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class MyContentHandler implements ContentHandler { 

    private XMLReader xmlReader; 

    public MyContentHandler(XMLReader xmlReader) { 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        if("sodium".equals(qName)) { 
            xmlReader.setContentHandler(new IgnoringContentHandler(xmlReader, this)); 
        } else { 
            System.out.println("START " + qName); 
        } 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        System.out.println("END " + qName); 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
        System.out.println(new String(ch, start, length)); 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 

IgnoringContentHandler

忽略内容处理程序

When the IgnoringContentHandler is done swallowing events it passes control back to your main ContentHandler.

当 IgnoringContentHandler 完成吞下事件时,它会将控制权传递回您的主 ContentHandler。

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class IgnoringContentHandler implements ContentHandler { 

    private int depth = 1; 
    private XMLReader xmlReader; 
    private ContentHandler contentHandler; 

    public IgnoringContentHandler(XMLReader xmlReader, ContentHandler contentHandler) { 
        this.contentHandler = contentHandler; 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        depth++; 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        depth--; 
        if(0 == depth) { 
           xmlReader.setContentHandler(contentHandler); 
        } 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 

回答by COME FROM

You could have one handler (ComplexNodeHandler) that handles only some parts of a document (complex_node) and passes all other pieces to another handler. The constructor for ComplexNodeHandler would take the other handler as a parameter. I mean something like this:

您可以有一个处理程序 (ComplexNodeHandler),它只处理文档的某些部分 (complex_node),并将所有其他部分传递给另一个处理程序。ComplexNodeHandler 的构造函数会将另一个处理程序作为参数。我的意思是这样的:

class ComplexNodeHandler {

    private ContentHandler handlerForOtherNodes;

    public ComplexNodeHandler(ContentHandler handlerForOtherNodes) {
         this.handlerForOtherNodes = handlerForOtherNodes;
    }

    ...

    public startElement(String uri, String localName, String qName, Attributes atts) {
        if (currently in complex node) {
            [handle complex node data] 
        } else {
            // pass the event to the document specific handler
            handlerForOtherNodes.startElement(uri, localName, qName, atts);
       }
    } 

    ...

}

There could be better alternatives still since I'm not that familiar with SAX. Writing a base handler for the common parts and inheriting it could work too but I'm not sure if using inheritance here is a good idea.

可能还有更好的选择,因为我对 SAX 不太熟悉。为公共部分编写一个基本处理程序并继承它也可以工作,但我不确定在这里使用继承是否是一个好主意。