java SAX:如何获取元素的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4119870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 04:50:35  来源:igfitidea点击:

SAX: How to get the content of an element

javaxmlsax

提问by Robert Strauch

I have some trouble understanding parsing XML structures with SAX. Let's say there is the following XML:

我在理解使用 SAX 解析 XML 结构时遇到了一些麻烦。假设有以下 XML:

<root>
  <element1>Value1</element1>
  <element2>Value2</element2>
</root>

and a String variable myString.

和一个字符串变量myString

Just going through with the methods startElement, endElement() and characters() is easy. But I don't understand how I can achieve the following:

只需通过方法 startElement、endElement() 和 characters() 就很容易了。但我不明白如何实现以下目标:

If the current element equals element1store its value value1in myString. As far as I understand there is nothing like:

如果当前元素等于,则将element1其值存储value1myString. 据我了解,没有什么像:

if (qName.equals("element1")) myString = qName.getValue();

Guess I'm just thinking too complicated :-)

猜猜我只是想得太复杂了:-)

Robert

罗伯特

采纳答案by Cameron Skinner

With SAX you need to maintain your own stack. You can do something like this for very basic processing:

使用 SAX,您需要维护自己的堆栈。对于非常基本的处理,您可以执行以下操作:

void startElement(...) {
    if (name.equals("element1")) {
        inElement1 = true;
        element1Content = new StringBuffer();
    }
}

void characters(...) {
    if (inElement1) {
        element1Content.append(characterData);
    }
}

void endElement(...) {
    if (name.equals("element2")) {
        inElement1 = false;
        processElement1Content(element1Content.toString());
    }
}

If you want code as in your example then you need to use the DOM model rather than SAX. DOM is easier to code up but is generally slower and more memory expensive than SAX.

如果您想要示例中的代码,那么您需要使用 DOM 模型而不是 SAX。DOM 更容易编写代码,但通常比 SAX 更慢且内存成本更高。

I recommend using a third-party library rather than the built-in Java XML libraries for DOM manipulation. Dom4J seems pretty good but there are probably other libraries out there too.

我建议使用第三方库而不是内置的 Java XML 库来进行 DOM 操作。Dom4J 看起来不错,但可能还有其他库。

回答by jvwilge

This solution works for a single element with text content. When element1 has more sub-elements some more work is needed. Brian's remark is a very important one. When you have multiple elements or want a more generic solution this might help you. I tested it with a 300+MB xml file and it's still very fast:

此解决方案适用于具有文本内容的单个元素。当 element1 有更多的子元素时,需要做更多的工作。布赖恩的评论是非常重要的。当您有多个元素或想要更通用的解决方案时,这可能会对您有所帮助。我用一个 300+MB 的 xml 文件测试了它,它仍然非常快:

final StringBuilder builder=new StringBuilder();
XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();

DefaultHandler handler = new DefaultHandler() {
    boolean isParsing = false;

    public void startElement(String uri, String localName, String qName, Attributes attributes) {
        if ("element1".equals(localName)) {
            isParsing = true;
        }
        if (isParsing) {
            builder.append("<" + qName + ">");
        }
    }

    @Override
    public void characters(char[] chars, int i, int i1) throws SAXException {
        if (isParsing) {
            builder.append(new String(chars, i, i1));
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (isParsing) {
            builder.append("</" + qName + ">");
        }
        if ("element1".equals(localName)) {
            isParsing = false;
        }
    }
};

saxXmlReader.setContentHandler(handler);
saxXmlReader.setErrorHandler(handler);

saxXmlReader.parse(new InputSource(new FileInputStream(input)));

回答by Brian Agnew

You should record the contents via characters(), append to a StringBuilder for each invocation and only store the concatenated value upon the endElement()call.

您应该通过 记录内容characters(),为每次调用附加到 StringBuilder 并仅在调用时存储连接的值endElement()

Why ? Because characters()can be called multiple timesfor the element content - each call referencing a successive subsequence of that text element.

为什么 ?因为characters()可以为元素内容多次调用 - 每次调用都引用该文本元素的连续子序列。