使用 Java 从 XML 中提取数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12837430/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 10:25:08  来源:igfitidea点击:

Extracting data from XML using Java

javaxmlxml-parsing

提问by Victoria

I have the following XML code:

我有以下 XML 代码:

<CampaignFrameResponse
  xmlns="http://Qsurv/api"
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Message>Success</Message>
  <Status>Success</Status>
  <FrameHeight>308</FrameHeight>   
  <FrameUrl>http://delivery.usurv.com?Key=a5018c85-222a-4444-a0ca-b85c42f3757d&amp;ReturnUrl=http%3a%2f%2flocalhost%3a8080%2feveningstar%2fhome</FrameUrl> 
</CampaignFrameResponse>

What I'm trying to do is extract the nodes and assign them to a variable. So for example, I'd have a variable called FrameHeightcontaining the value 308.

我想要做的是提取节点并将它们分配给一个变量。例如,我有一个FrameHeight包含 value的变量308

This is the Java code I have so far:

这是我到目前为止的 Java 代码:

private void processNode(Node node) {
    NodeList nodeList = node.getChildNodes();
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node currentNode = nodeList.item(i);
       if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
            //calls this method for all the children which is Element
            LOG.warning("current node name: " + currentNode.getNodeName());
            LOG.warning("current node type: " + currentNode.getNodeType());
            LOG.warning("current node value: " + currentNode.getNodeValue());
            processNode(currentNode);
       }
    }

}

This prints out the node names, types and values, but what is the best way of assigning each of the values to an appropriately-named variable? eg int FrameHeight = 308?

这会打印出节点名称、类型和值,但是将每个值分配给适当命名的变量的最佳方法是什么?例如int FrameHeight = 308

This is my updated code where the nodeValue variable keeps returning null:

这是我更新的代码,其中 nodeValue 变量不断返回 null:

processNode(Node node) {
NodeList nodeList = node.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
    Node currentNode = nodeList.item(i);
    if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
        //calls this method for all the children which is Element
        String nodeName = currentNode.getNodeName();
        String nodeValue = currentNode.getNodeValue();
        if(nodeName.equals("Message")) {
            LOG.warning("nodeName: " + nodeName); 
            message = nodeValue;
            LOG.warning("Message: " + message); 
        } 
        else if(nodeName.equals("FrameHeight")) {
            LOG.warning("nodeName: " + nodeName); 
            frameHeight = nodeValue;
            LOG.warning("frameHeight: " + frameHeight);
        }
        processNode(currentNode);
    }
}

}

}

采纳答案by vels4j

Xstream wont support in your case, it can be used for convert object to xml then get back again. If your xml is generated from an instance of CampaignFrameResponse class, u can use xstream.

Xstream 不支持您的情况,它可用于将对象转换为 xml 然后再返回。如果您的 xml 是从 CampaignFrameResponse 类的实例生成的,您可以使用 xstream。

Otherwise you simply check like

否则你只需检查一下

String nodeName = currentNode.getNodeName()
String nodeValue = currentNode.getNodeValue() ;
if( nodeName.equals("Message")){
     message = nodeValue ;
} else if( nodeName.equals("FrameHeight") {
     frameHeight = nodeValue ;
}

You need to parse if you need int value.

如果需要 int 值,则需要解析。

回答by Kumar Vivek Mitra

You can use DOM, SAX, Pull-Parser, but then its good to go with the following APIs.

您可以使用DOM, SAX, Pull-Parser,但最好使用以下 API。

-JAXP & JAXB

——JAXP & JAXB

-Castor

——Castor

Eg: DOM PARSING

例如:DOM 解析

DocumentBuilderFactory odbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder odb =  odbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(xml));
            Document odoc = odb.parse(is);
            odoc.getDocumentElement().normalize ();    // normalize text representation
            System.out.println ("Root element of the doc is " + odoc.getDocumentElement().getNodeName());
            NodeList LOP = odoc.getElementsByTagName("response");

                Node FPN =LOP.item(0);
                try{
                if(FPN.getNodeType() == Node.ELEMENT_NODE)
                    {

                    Element token = (Element)FPN;

                    NodeList oNameList1 = token.getElementsByTagName("user_id");
                    Element firstNameElement = (Element)oNameList1.item(0);
                    NodeList textNList1 = firstNameElement.getChildNodes();
                    this.setUser_follower_id(Integer.parseInt(((Node)textNList1.item(0)).getNodeValue().trim()));
                    System.out.println("#####The Parsed data#####");
                    System.out.println("user_id : " + ((Node)textNList1.item(0)).getNodeValue().trim());
                    System.out.println("#####The Parsed data#####");

回答by Fernando Miguélez

I have been working with XML in Java for a while (over ten years) and have tried many alternatives (custom text parsing, proprietary APIs, SAX, DOM, Xmlbeans, JAXB, etc.). I have learnt a pair of things:

我在 Java 中使用 XML 已经有一段时间(十多年了),并尝试了许多替代方法(自定义文本解析、专有 API、SAX、DOM、Xmlbeans、JAXB 等)。我学到了两件事:

  • Stick to the standards. Never use a proprietary API but a standard Java API (JAXP, that includes SAX, DOM, Stax, etc.). Your code will be more portable and maintenable and will not change whenever a version of an XML library changes and breaks compatibility (that happens very often).
  • Take your time and do learn XML technologies. I would recommend comprehensive knowledge of at least XSD, XSLT and XPath (needed for XSLT). If you do not have time, then concentrate on XSD.
  • Take advantage of the automatic XML code generation/parsing whenever possible. This implies knowing XSD. It pays off the original effort in the long run, the code is much more maintainable over time, parsing/marsalling is greatly optimized (usually more than if you use the "manual" JAXP APIs) and XML validation (you already have the XSD) can be carried out (less checking code, safety against bad-formed XML crashing your app, less integration efforts). And the best thing, you only write XSD code, almost all the Java code you will need to handle the data (Java Beans) will be generated for you.
  • 坚持标准。永远不要使用专有 API,而是使用标准 Java API(JAXP,包括 SAX、DOM、Stax 等)。您的代码将更具可移植性和可维护性,并且不会在 XML 库的版本发生变化和破坏兼容性时发生变化(这种情况经常发生)。
  • 花点时间学习 XML 技术。我建议至少具备 XSD、XSLT 和 XPath(XSLT 所需)的全面知识。如果您没有时间,那么请专注于 XSD。
  • 尽可能利用自动 XML 代码生成/解析。这意味着了解 XSD。从长远来看,它使最初的努力得到回报,随着时间的推移,代码更易于维护,解析/编组得到了极大的优化(通常比使用“手动”JAXP API 时更多)和 XML 验证(您已经有了 XSD)可以执行(更少的检查代码,防止格式错误的 XML 使您的应用程序崩溃,更少的集成工作)。最棒的是,您只需编写 XSD 代码,几乎所有处理数据所需的 Java 代码(Java Beans)都会为您生成。

Knowadays I tend to use code generation whenever I have to parse some XML like that. The Standard for that is JAXB (xmlbeans is dead and other alternatives may not be as mature or as wideley used). In your case I would define an XSD that defined your document in as fine detail as possible (i.e. if you use a String that can only have several values, do not use "xs:string" type but an enumerated one). It could look like this:

众所周知,每当我必须像这样解析一些 XML 时,我都倾向于使用代码生成。标准是 JAXB(xmlbeans 已死,其他替代方案可能不成熟或​​广泛使用)。在您的情况下,我会定义一个 XSD,该 XSD 会尽可能详细地定义您的文档(即,如果您使用只能具有多个值的字符串,请不要使用“xs:string”类型,而是使用枚举类型)。它可能是这样的:

<xs:schema attributeFormDefault="unqualified"
    elementFormDefault="qualified" targetNamespace="http://Qsurv/api"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="CampaignFrameResponse">
        <xs:complexType>
            <xs:sequence>
                <xs:element type="xs:string" name="Message" />
                <xs:element type="Status" name="Status" />
                <xs:element type="xs:short" name="FrameHeight" />
                <xs:element type="xs:anyURI" name="FrameUrl" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <<xs:simpleType name="Status">
        <xs:annotation>
            <xs:appinfo>
                <jaxb:typesafeEnumClass>
                    <jaxb:typesafeEnumMember name="SUCCESS"
                        value="Success" />
                    <jaxb:typesafeEnumMember name="FAILURE"
                        value="Failure" />
                </jaxb:typesafeEnumClass>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:enumeration value="Success" />
            <xs:enumeration value="Failure" />
        </xs:restriction>
    </xs:simpleType>
</xs:schema>

Now it is a matter of using JAXB tools (see xjc compiler options) to generate code and see a pair examples about how to marshal/unmarshal the generated Java Beans from/to XML.

现在是使用 JAXB 工具(请参阅 xjc 编译器选项)生成代码并查看有关如何从/向 XML 编组/解组生成的 Java Bean 的一对示例的问题。

回答by argmin

You could of course create a name-value map and update the map as you traverse the XML. At the end of the parsing you could look for the particular key in the map. Java doesn't let you create variables programmatically so you won't be able to generate a variable with its name based on the XML data.

您当然可以创建一个名称-值映射并在遍历 XML 时更新该映射。在解析结束时,您可以在地图中查找特定的键。Java 不允许您以编程方式创建变量,因此您将无法根据 XML 数据使用其名称生成变量。

Other than for style and readability, your decision to populate data-structures from XML depends on how well-defined the XML is and how much would its schema could possibly change in future. You could ask yourself questions like : Can the node-name change in future? Can XML subsections be introduced that would circumscribe this section? This might help you choose a certain parser (SAX/DOM or higher-level object-parsing APIs).

除了样式和可读性之外,从 XML 填充数据结构的决定取决于 XML 的定义有多好,以及它的模式将来可能会改变多少。您可以问自己这样的问题:将来节点名称可以更改吗?是否可以引入限制本节的 XML 小节?这可能有助于您选择特定的解析器(SAX/DOM 或更高级别的对象解析 API)。

Of course, if you have no control on the XML definition there is little you can do other than parsing what you've got.

当然,如果您无法控制 XML 定义,那么除了解析您所拥有的内容之外,您几乎无能为力。

回答by acerisara

I would not suggest to parse the xml directly (unless you are forced to do so), but instead to use an external library, like http://x-stream.github.io/. The idea is that you can create an object that represents your xml schema and the library will populate that object for you.

我不建议直接解析 xml(除非您被迫这样做),而是使用外部库,如http://x-stream.github.io/。这个想法是您可以创建一个代表您的 xml 模式的对象,库将为您填充该对象。

回答by urir

I suggest using - x-stream.github.io- with some demarcation annotation you can create object from XML very fast with very little coding.

我建议使用 - x-stream.github.io- 带有一些分界注释,您可以非常快速地从 XML 创建对象,只需很少的编码。