java 如何使用java将一个XML文件拆分成多个XML文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29166170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 14:46:53  来源:igfitidea点击:

how to split an XML file into multiple XML files using java

javaxml

提问by Din Ionu? Valentin

I'm using XML files in Java for the first time and i need some help. I am trying to split an XML file to multiple XML files using Java

我是第一次在 Java 中使用 XML 文件,我需要一些帮助。我正在尝试使用 Java 将一个 XML 文件拆分为多个 XML 文件

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<products>
    <product>
        <description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
        <gtin>00027242816657</gtin>
        <price>2999.99</price>
        <orderId>2343</orderId>
        <supplier>Sony</supplier>
    </product>
    <product>
        <description>Apple iPad 2 with Wi-Fi 16GB - iOS 5 - Black
        </description>
        <gtin>00885909464517</gtin>
        <price>399.0</price>
        <orderId>2343</orderId>
        <supplier>Apple</supplier>
    </product>
    <product>
        <description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue
        </description>
        <gtin>00027242831438</gtin>
        <price>91.99</price>
        <orderId>2343</orderId>
        <supplier>Sony</supplier>
    </product>
    <product>
        <description>Apple MacBook Air A 11.6" Mac OS X v10.7 Lion MacBook
        </description>
        <gtin>00885909464043</gtin>
        <price>1149.0</price>
        <orderId>2344</orderId>
        <supplier>Apple</supplier>
    </product>
    <product>
        <description>Panasonic TC-L47E50 47" Smart TV Viera E50 Series LED
            HDTV</description>
        <gtin>00885170076471</gtin>
        <price>999.99</price>
        <orderId>2344</orderId>
        <supplier>Panasonic</supplier>
    </product>
</products>

and I'm trying to get three XML documents like:

我正在尝试获取三个 XML 文档,例如:

 <?xml version="1.0" encoding="UTF-8"?>
<products>
        <product>
            <description>Sony 54.6" (Diag) Xbr Hx929 Internet Tv</description>
            <gtin>00027242816657</gtin>
            <price currency="USD">2999.99</price>
            <orderid>2343</orderid>
        </product>
        <product>
            <description>Sony NWZ-E464 8GB E Series Walkman Video MP3 Player Blue</description>
            <gtin>00027242831438</gtin>
            <price currency="USD">91.99</price>
            <orderid>2343</orderid>
        </product>
</products>

one for each supplier. How can I receive it? Any help on this will be great.

每个供应商一个。我怎样才能收到它?对此的任何帮助都会很棒。

采纳答案by slux83

Make sure you change the path in "inputFile" to your file and also the output part:

确保将“inputFile”中的路径更改为您的文件以及输出部分:

StreamResult result = new StreamResult(new File("C:\xmls\" + supplier.trim() + ".xml"));

Here your code.

这是你的代码。

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ExtractXml
{
    /**
     * @param args
     */
    public static void main(String[] args) throws Exception
    {
        String inputFile = "resources/products.xml";

        File xmlFile = new File(inputFile);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true); // never forget this!

        XPathFactory xfactory = XPathFactory.newInstance();
        XPath xpath = xfactory.newXPath();
        XPathExpression allProductsExpression = xpath.compile("//product/supplier/text()");
        NodeList productNodes = (NodeList) allProductsExpression.evaluate(doc, XPathConstants.NODESET);

        //Save all the products
        List<String> suppliers = new ArrayList<String>();
        for (int i=0; i<productNodes.getLength(); ++i)
        {
            Node productName = productNodes.item(i);

            System.out.println(productName.getTextContent());
            suppliers.add(productName.getTextContent());
        }

        //Now we create the split XMLs

        for (String supplier : suppliers)
        {
            String xpathQuery = "/products/product[supplier='" + supplier + "']";

            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList productNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);

            System.out.println("Found " + productNodesFiltered.getLength() + 
                               " product(s) for supplier " + supplier);

            //We store the new XML file in supplierName.xml e.g. Sony.xml
            Document suppXml = dBuilder.newDocument();

            //we have to recreate the root node <products>
            Element root = suppXml.createElement("products"); 
            suppXml.appendChild(root);
            for (int i=0; i<productNodesFiltered.getLength(); ++i)
            {
                Node productNode = productNodesFiltered.item(i);

                //we append a product (cloned) to the new file
                Node clonedNode = productNode.cloneNode(true);
                suppXml.adoptNode(clonedNode); //We adopt the orphan :)
                root.appendChild(clonedNode);
            }

            //At the end, we save the file XML on disk
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(suppXml);

            StreamResult result =  new StreamResult(new File("resources/" + supplier.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + supplier);
        }
    }

}

回答by VikasBhat

Consider this xml

考虑这个xml

<?xml version="1.0"?>
<SSNExportDocument xmlns="urn:com:ssn:schema:export:SSNExportFormat.xsd" Version="0.1" DocumentID="b482350d-62bb-41be-b792-8a9fe3884601-1" ExportID="b482350d-62bb-41be-b792-8a9fe3884601" JobID="464" RunID="3532468" CreationTime="2019-04-16T02:20:01.332-04:00" StartTime="2019-04-15T20:20:00.000-04:00" EndTime="2019-04-16T02:20:00.000-04:00">
    <MeterData MeterName="MUNI1-11459398" UtilDeviceID="11459398" MacID="00:12:01:fae:fe:00:d5:fc">
        <RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
            <RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:06.214-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
                <Tier Number="0">
                    <Register Number="1" Summation="5949.1000" SummationUOM="GAL"/>
                </Tier>
            </RegisterRead>
        </RegisterData>
    </MeterData>
    <MeterData MeterName="MUNI4-11460365" UtilDeviceID="11460365" MacID="00:11:01:bc:fe:00:d3:f9">
        <RegisterData StartTime="2019-04-15T20:00:00.000-04:00" EndTime="2019-04-15T20:00:00.000-04:00" NumberReads="1">
            <RegisterRead ReadTime="2019-04-15T20:00:00.000-04:00" GatewayCollectedTime="2019-04-16T01:40:11.082-04:00" RegisterReadSource="REG_SRC_TYPE_EO_CURR_READ" Season="-1">
                <Tier Number="0">
                    <Register Number="1" Summation="136349.9000" SummationUOM="GAL"/>
                </Tier>
            </RegisterRead>
        </RegisterData>
    </MeterData>

We can use JAXB which converts your xml tags to objects. Then we can play around with them.

我们可以使用 JAXB 将您的 xml 标签转换为对象。然后我们就可以和他们一起玩了。

File xmlFile = new File("input.xml");
jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
SSNExportDocument ssnExpDoc = (SSNExportDocument) jaxbUnmarshaller.unmarshal(xmlFile);
MeterData mD = new MeterData();
Map<String, List<MeterData>> meterMapper = new HashMap<String, List<MeterData>>(); // Phantom Reference

for (MeterData mData : ssnExpDoc.getMeterData()) {
            String meterFullName = mData.getMeterName();
            String[] splitMeterName = meterFullName.split("-");
            List<MeterData> _meterDataList = meterMapper.get(splitMeterName[0]);// o(1)
            if (_meterDataList == null) {
                _meterDataList = new ArrayList<>();
                _meterDataList.add(mData);
                meterMapper.put(splitMeterName[0], _meterDataList);
                _meterDataList = null;
            } else {
                _meterDataList.add(mData);
            }
        }

meterMapper contains tag names against list of objects

meterMapper 包含针对对象列表的标签名称

Then Marshall the contents using

然后使用 Marshall 将内容编组

       JAXBContext jaxbContext = JAXBContext.newInstance(SSNExportDocument.class);

        // Create Marshaller
        Marshaller jaxbMarshaller = jaxbContext.createMarshaller();

        // Required formatting??
        jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        jaxbMarshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
        //jaxbMarshaller.setProperty("com.sun.xml.bind.xmlDeclaration", Boolean.FALSE);

        // Print XML String to Console

        StringWriter sw = new StringWriter();

        // Write XML to StringWriter
        jaxbMarshaller.marshal(employee, sw);

        // Verify XML Content
        String xmlContent = sw.toString();
        System.out.println(xmlContent);

回答by slux83

You can have a look here to see how to parse a XML document using DOM, in Java: DOM XML Parser Example

您可以在此处查看如何在 Java 中使用 DOM 解析 XML 文档: DOM XML Parser Example

Here, how to write the new XML file(s): Create XML file using java

在这里,如何编写新的 XML 文件: 使用 java 创建 XML 文件

In addition you could study XPath to easily select your nodes: Java Xpath expression

此外,您可以学习 XPath 以轻松选择您的节点:Java Xpath 表达式

If the performances are not your goal, first of all, once you load your DOM and your Xpath, you can retrieve all the suppliers you have in your xml document using the following XPath query

如果性能不是您的目标,首先,一旦您加载了 DOM 和 Xpath,您就可以使用以下 XPath 查询检索您的 xml 文档中的所有供应商

//supplier/text()

you will get something like that:

你会得到这样的东西:

Text='Sony'
Text='Apple'
Text='Sony'
Text='Apple'
Text='Panasonic'

Then I will put those results in a ArraryList or whatever. The second step will be the iteration of that collection, and for each item query the XML input document in order to extract all the nodes with a particular supplier:

然后我会将这些结果放入 ArraryList 或其他任何内容中。第二步将是该集合的迭代,并为每个项目查询 XML 输入文档以提取具有特定供应商的所有节点:

/products/product[supplier='Sony'] 

of course in java you will have to build the last xpath query in a dynamic way:

当然,在 Java 中,您必须以动态方式构建最后一个 xpath 查询:

String xpathQuery = "/products/product/[supplier='" + currentValue + "']

After that, you will get the list of nodes which match the supplier you specified. The next step would be constructing the new XML DOM and save it on a file.

之后,您将获得与您指定的供应商匹配的节点列表。下一步是构建新的 XML DOM 并将其保存在文件中。

回答by Selva

DOM parser will consume more memory. I prefer to use SAX parser to read XML and write .

DOM 解析器会消耗更多内存。我更喜欢使用 SAX 解析器来读取 XML 和编写 .

回答by Christoph Burmeister

I like the approach of Xmappr (https://code.google.com/p/xmappr/) where you can use simple annotations:

我喜欢 Xmappr ( https://code.google.com/p/xmappr/) 的方法,您可以在其中使用简单的注释:

first the root-element Products which simply holds a list of Product-instances

首先是根元素产品,它只包含产品实例列表

@RootElement
public class Products {

    @Element
    public List<Product> product;
}

Then the Product-class

然后是产品类

@RootElement
public class Product {

   @Element
   public String description;

   @Element
   public String supplier;

   @Element
   public String gtin;

   @Element
   public String price;

   @Element
   public String orderId;
}

And then you simply fetch the Product-instances from the Products:

然后您只需从产品中获取产品实例:

public static void main(String[] args) throws FileNotFoundException {
    Reader reader = new FileReader("test.xml");
    Xmappr xm = new Xmappr(Products.class);
    Products products = (Products) xm.fromXML(reader);

    // fetch list of products
    List<Product> listOfProducts = products.product;

    // do sth with the products in the list
    for (Product product : listOfProducts) {
        System.out.println(product.description);
    }       
}

And then you can do whatever you want with the products (e.g. sorting them according the supplier and put them out to an xml-file)

然后你可以对产品做任何你想做的事情(例如,根据供应商对它们进行分类并将它们放到一个 xml 文件中)

回答by Florian Schaetz

An alternative to Dom would be, if you have the Schema (XSD) for your XML dialect, JAXB.

Dom 的替代方案是,如果您有 XML 方言的架构 (XSD),JAXB。