Java 如何仅获取具有值 Stax 的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3293841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 21:48:11  来源:igfitidea点击:

How to get element only elements with values Stax

javaxmlxml-parsingstax

提问by ant

I'm trying to get only elements that have text, ex xml :

我试图只获取具有文本的元素,例如 xml :

<root>
      <Item>
        <ItemID>4504216603</ItemID>
        <ListingDetails>
          <StartTime>10:00:10.000Z</StartTime>
          <EndTime>10:00:30.000Z</EndTime>
          <ViewItemURL>http://url</ViewItemURL>
            ....
           </item> 

It should print

它应该打印

Element Local Name:ItemID
Text:4504216603
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url

This code prints also root, item etc. Is it even possible, it must be I just can't google it.

这段代码也打印根、项目等。它甚至可能吗,一定是我不能用谷歌搜索它。

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream input = new FileInputStream(new File("src/main/resources/file.xml"));
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(input);

while (xmlStreamReader.hasNext()) {
    int event = xmlStreamReader.next();

    if (event == XMLStreamConstants.START_ELEMENT) {
    System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
    }

    if (event == XMLStreamConstants.CHARACTERS) {
                        if(!xmlStreamReader.getText().trim().equals("")){
                        System.out.println("Text:"+xmlStreamReader.getText().trim());
                        }
                }

            }

Edit incorrect behaviour:

编辑不正确的行为

    Element Local Name:root
    Element Local Name:item
    Element Local Name:ItemID
    Text:4504216603
    Element Local Name:ListingDetails
    Element Local Name:StartTime
    Text:10:00:10.000Z
    Element Local Name:EndTime
    Text:10:00:30.000Z
    Element Local Name:ViewItemURL
    Text:http://url

I don't want that root and other nodes which don't have text to be printed, just the output which I wrote above. thank you

我不想打印没有文本的根节点和其他节点,只是我上面写的输出。谢谢你

采纳答案by Georgy Bolyuba

Try this:

尝试这个:

while (xmlStreamReader.hasNext()) {
    int event = xmlStreamReader.next();

    if (event == XMLStreamConstants.START_ELEMENT) {
        try {
            String text = xmlStreamReader.getElementText();
            System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
            System.out.println("Text:" + text);
        } catch (XMLStreamException e) {

        }
    }

}

SAX based solution (works):

基于 SAX 的解决方案(有效):

public class Test extends DefaultHandler {

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("src/file.xml"), new Test());
    }

    private String currentName;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        currentName = qName;
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        String string = new String(ch, start, length);
        if (hasText(string)) {
            System.out.println(currentName);
            System.out.println(string);
        }
    }

    private boolean hasText(String string) {
        string = string.trim();
        return string.length() > 0;
    }
}

回答by ant

Stax solution :

税收解决方案:

Parse document

解析文档

public void parseXML(InputStream xml) {
        try {

            DOMResult result = new DOMResult();
            XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
            XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
            TransformerFactory transFactory = TransformerFactory.newInstance();
            Transformer transformer = transFactory.newTransformer();
            transformer.transform(new StAXSource(reader), result);
            Document document = (Document) result.getNode();

            NodeList startlist = document.getChildNodes();

            processNodeList(startlist);

        } catch (Exception e) {
            System.err.println("Something went wrong, this might help :\n" + e.getMessage());
        }
    }

Now all nodes from the document are in a NodeList so do this next :

现在文档中的所有节点都在一个 NodeList 中,所以接下来执行以下操作:

private void processNodeList(NodeList nodelist) {
        for (int i = 0; i < nodelist.getLength(); i++) {
            if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
                getNodeNamesAndValues(nodelist.item(i));
            }
            processNodeList(nodelist.item(i).getChildNodes());
        }
    }

Then for each element node with valid text get name and value

然后为每个具有有效文本的元素节点获取名称和值

public void getNodeNamesAndValues(Node n) {

        String nodeValue = null;
        String nodeName = null;

        if (hasValidText(n)) {
            while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
                n = n.getFirstChild();
            }

            nodeValue = StringUtils.strip(n.getTextContent());
            nodeName = n.getLocalName();

            System.out.println(nodeName + " " + nodeValue);

        }
    }

Bunch of useful methods to check nodes :

一堆有用的方法来检查节点:

private static boolean hasValidAttributes(Node node) {
        return (node.getAttributes().getLength() > 0);

    }

private boolean hasValidText(Node node) {
        String textValue = node.getTextContent();

        return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
    }

private boolean isWhiteSpace(String nodeText) {
        if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
            return true;
        else
            return false;
    }

I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :

我还使用了 StringUtils,如果您使用的是 maven,您可以通过在 pom.xml 中包含它来获得它:

<dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.5</version>
        </dependency>

This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).

如果您正在读取大文件,这将是低效的,但如果您先拆分它们,则效率不会那么高。这就是我带来的(使用谷歌)。有更多更好的解决方案,这是我的,我是业余爱好者(目前)。