在 Java 中合并两个 XML 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/648471/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 17:22:37  来源:igfitidea点击:

Merge Two XML Files in Java

javaxmlapiparsing

提问by Mark Davidson

I have two XML files of similar structure which I wish to merge into one file. Currently I am using EL4J XML Mergewhich I came across in this tutorial. However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4. Instead it just discards either 1 and 2 or 3 and 4 depending on which file is merged first.

我有两个结构相似的 XML 文件,我希望将它们合并到一个文件中。目前我正在使用我在本教程中遇到的EL4J XML Merge。然而,它并没有像我期望的那样合并,例如,主要问题是它没有将两个文件中的文件合并到一个元素中,也就是包含 1、2、3 和 4 的元素。相反,它只是丢弃 1 和 2 或 3 和 4取决于首先合并哪个文件。

So I would be grateful to anyone who has experience with XML Merge if they could tell me what I might be doing wrong or alternatively does anyone know of a good XML API for Java that would be capable of merging the files as I require?

因此,如果任何有 XML 合并经验的人都可以告诉我我可能做错了什么,或者是否有人知道能够根据我的需要合并文件的用于 Java 的良好 XML API,我将不胜感激?

Many Thanks for Your Help in Advance

非常感谢您的帮助

Edit:

编辑:

Could really do with some good suggestions on doing this so added a bounty. I've tried jdigital's suggestion but still having issues with XML merge.

真的可以提供一些关于这样做的好建议,所以增加了赏金。我已经尝试过 jdigital 的建议,但仍然存在 XML 合并问题。

Below is a sample of the type of structure of XML files that I am trying to merge.

下面是我尝试合并的 XML 文件结构类型的示例。

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

<run xmloutputversion="1.02">
    <info type="b" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

Expected output

预期输出

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
            <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

回答by Andy White

You might be able to write a java app that deserilizes the XML documents into objects, then "merge" the individual objects programmatically into a collection. You can then serialize the collection object back out to an XML file with everything "merged."

您可以编写一个 Java 应用程序,将 XML 文档反序列化为对象,然后以编程方式将各个对象“合并”到一个集合中。然后,您可以将集合对象序列化回一个 XML 文件,并“合并”所有内容。

The JAXBAPI has some tools that can convert an XML document/schema into java classes. The "xjc" tool might be able to do this, although I can't remember if you can create classes directly from the XML doc, or if you have to generate a schema first. There are tools out there than can generate a schema from an XML doc.

JAXBAPI有一些工具,可以在XML文档/模式转换成Java类。“xjc”工具可能能够做到这一点,但我不记得是否可以直接从 XML 文档创建类,或者是否必须先生成模式。有一些工具可以从 XML 文档生成模式。

Hope this helps... not sure if this is what you were looking for.

希望这会有所帮助……不确定这是否是您要找的。

回答by jdigital

I took a look at the referenced link; it's odd that XMLMerge would not work as expected. Your example seems straightforward. Did you read the section entitled Using XPath declarations with XmlMerge? Using the example, try to set up an XPath for results and set it to merge. If I'm reading the doc correctly, it would look something like this:

我看了一下引用的链接;奇怪的是 XMLMerge 不能按预期工作。你的例子看起来很简单。您是否阅读了标题为在 XmlMerge 中使用 XPath 声明的部分?使用该示例,尝试为结果设置 XPath 并将其设置为合并。如果我正确阅读文档,它看起来像这样:

XPath.resultsNode=results
action.resultsNode=MERGE

回答by tyler

It might help if you were explicit about the result that you're interested in achieving. Is this what you're asking for?

如果您明确说明您有兴趣实现的结果,这可能会有所帮助。这是你要的吗?

Doc A:

医生A:

<root>
  <a/>
  <b>
    <c/>
  </b>
</root>

Doc B:

医生乙:

<root>
  <d/>
</root>

Merged Result:

合并结果:

<root>
  <a/>
  <b>
    <c/>
  </b>
  <d/>
</root>

Are you worried about scaling for large documents?

您是否担心大型文档的缩放?

The easiest way to implement this in Java is to use a streaming XML parser (google for 'java StAX'). If you use the javax.xml.stream library you'll find that the XMLEventWriter has a convenient method XMLEventWriter#add(XMLEvent). All you have to do is loop over the top level elements in each document and add them to your writer using this method to generate your merged result. The only funky part is implementing the reader logic that only considers (only calls 'add') on the top level nodes.

在 Java 中实现这一点的最简单方法是使用流式 XML 解析器(google for 'java StAX')。如果您使用 javax.xml.stream 库,您会发现 XMLEventWriter 有一个方便的方法 XMLEventWriter#add(XMLEvent)。您所要做的就是遍历每个文档中的顶级元素,并使用此方法将它们添加到您的编写器中以生成合并结果。唯一时髦的部分是实现仅在顶级节点上考虑(仅调用“add”)的阅读器逻辑。

I recently implemented this method if you need hints.

如果您需要提示,我最近实施了此方法。

回答by Neil Coffey

Have you considered just not bothering with parsing the XML "properly" and just treating the files as big long strings and using boring old things such as hash maps and regular expressions...? This could be one of those cases where the fancy acronyms with X in them just make the job fiddlier than it needs to be.

您是否考虑过不打扰“正确”解析 XML,而只是将文件视为大长字符串并使用无聊的旧东西,例如哈希映射和正则表达式......?这可能是其中一种带有 X 的花哨首字母缩略词只会使工作比需要的更繁琐的情况之一。

Obviously this does depend a bit on how much data you actually need to parse out while doing the merge. But by the sound of things, the answer to that is not much.

显然,这确实取决于在进行合并时您实际需要解析多少数据。但从事情的声音来看,答案并不多。

回答by StaxMan

In addition to using Stax (which does make sense), it'd probably be easier with StaxMate (http://staxmate.codehaus.org/Tutorial). Just create 2 SMInputCursors, and child cursor if need be. And then typical merge sort with 2 cursors. Similar to traversing DOM documents in recursive-descent manner.

除了使用 Stax(确实有意义)之外,使用 StaxMate(http://staxmate.codehaus.org/Tutorial)可能会更容易。如果需要,只需创建 2 个 SMInputCursors 和子光标。然后是典型的带有 2 个游标的合并排序。类似于以递归下降的方式遍历 DOM 文档。

回答by McDowell

Not very elegant, but you could do this with the DOM parser and XPath:

不是很优雅,但您可以使用 DOM 解析器和 XPath 来做到这一点:

public class MergeXmlDemo {

  public static void main(String[] args) throws Exception {
    // proper error/exception handling omitted for brevity
    File file1 = new File("merge1.xml");
    File file2 = new File("merge2.xml");
    Document doc = merge("/run/host/results", file1, file2);
    print(doc);
  }

  private static Document merge(String expression,
      File... files) throws Exception {
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xpath = xPathFactory.newXPath();
    XPathExpression compiledExpression = xpath
        .compile(expression);
    return merge(compiledExpression, files);
  }

  private static Document merge(XPathExpression expression,
      File... files) throws Exception {
    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
        .newInstance();
    docBuilderFactory
        .setIgnoringElementContentWhitespace(true);
    DocumentBuilder docBuilder = docBuilderFactory
        .newDocumentBuilder();
    Document base = docBuilder.parse(files[0]);

    Node results = (Node) expression.evaluate(base,
        XPathConstants.NODE);
    if (results == null) {
      throw new IOException(files[0]
          + ": expression does not evaluate to node");
    }

    for (int i = 1; i < files.length; i++) {
      Document merge = docBuilder.parse(files[i]);
      Node nextResults = (Node) expression.evaluate(merge,
          XPathConstants.NODE);
      while (nextResults.hasChildNodes()) {
        Node kid = nextResults.getFirstChild();
        nextResults.removeChild(kid);
        kid = base.importNode(kid, true);
        results.appendChild(kid);
      }
    }

    return base;
  }

  private static void print(Document doc) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory
        .newInstance();
    Transformer transformer = transformerFactory
        .newTransformer();
    DOMSource source = new DOMSource(doc);
    Result result = new StreamResult(System.out);
    transformer.transform(source, result);
  }

}

This assumes that you can hold at least two of the documents in RAM simultaneously.

这假设您可以同时在 RAM 中保存至少两个文档。

回答by tyler

So, you're only interested in merging the 'results' elements? Everything else is ignored? The fact that input0 has an <info type="a"/> and input1 has an <info type="b"/> and the expected result has an <info type="a"/> seems to suggest this.

所以,您只对合并“结果”元素感兴趣?其他的都被忽略了?input0 有一个 <info type="a"/> 而 input1 有一个 <info type="b"/> 而预期结果有一个 <info type="a"/> 的事实似乎表明了这一点。

If you're not worried about scaling and you want to solve this problem quickly then I would suggest writing a problem-specific bit of code that uses a simple library like JDOM to consider the inputs and write the output result.

如果您不担心缩放并且想要快速解决这个问题,那么我建议您编写一段特定于问题的代码,使用像 JDOM 这样的简单库来考虑输入并编写输出结果。

Attempting to write a generic tool that was 'smart' enough to handle all of the possible merge cases would be pretty time consuming - you'd have to expose a configuration capability to define merge rules. If you know exactly what your data is going to look like and you know exactly how the merge needs to be executed then I would imagine your algorithm would walk each XML input and write to a single XML output.

尝试编写一个足够“智能”以处理所有可能的合并情况的通用工具将非常耗时 - 您必须公开配置功能来定义合并规则。如果您确切地知道您的数据将是什么样子并且您确切地知道需要如何执行合并,那么我可以想象您的算法将遍历每个 XML 输入并写入单个 XML 输出。

回答by Prabhu R

You can try Dom4Jwhich provides a very good means to extract information using XPath Queries and also allows you to write XML very easily. You just need to play around with the API for a while to do your job

您可以尝试Dom4J,它提供了一种使用 XPath 查询提取信息的非常好的方法,并且还允许您非常轻松地编写 XML。你只需要玩一会儿 API 就可以完成你的工作

回答by Mark Davidson

Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.

感谢大家的建议,不幸的是,最终没有一个建议的方法适合,因为我需要为结构的不同节点合并的方式制定规则。

So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure. From this I used XStreamto unserialize the XML file back into classes.

所以我所做的是获取与我正在合并的 XML 文件相关的 DTD,并从中创建许多反映结构的类。由此我使用XStream将 XML 文件反序列化回类。

This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.

通过这种方式,我对我的类进行了注释,使其成为一个使用通过注释分配的规则和一些反射的组合的过程,以便合并对象而不是合并实际的 XML 结构。

If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gzthe codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.

如果有人对在这种情况下合并 Nmap XML 文件的代码感兴趣,请参阅http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz代码并不完美,我承认不是非常灵活,但它绝对有效。我计划在我有空闲时间时重新实现系统,使其自动解析 DTD。

回答by stwissel

I use XSLT to merge XML files. It allows me to adjust the merge operation to just slam the content together or to merge at an specific level. It is a little more work (and XSLT syntax is kind of special) but super flexible. A few things you need here

我使用 XSLT 来合并 XML 文件。它允许我调整合并操作以将内容撞在一起或在特定级别合并。它需要做更多的工作(而且 XSLT 语法有点特殊)但非常灵活。这里有一些你需要的东西

a) Include an additional file b) Copy the original file 1:1 c) Design your merge point with or without duplication avoidance

a) 包含一个附加文件 b) 以 1:1 的比例复制原始文件 c) 设计您的合并点,无论是否避免重复

a) In the beginning I have

a) 一开始我有

<xsl:param name="mDocName">yoursecondfile.xml</xsl:param>
<xsl:variable name="mDoc" select="document($mDocName)" />

this allows to point to the second file using $mDoc

这允许使用 $mDoc 指向第二个文件

b) The instructions to copy a source tree 1:1 are 2 templates:

b) 1:1 复制源树的指令是 2 个模板:

<!-- Copy everything including attributes as default action -->
<xsl:template match="*">
    <xsl:element name="{name()}">
         <xsl:apply-templates select="@*" />
        <xsl:apply-templates />
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{name()}"><xsl:value-of select="." /></xsl:attribute>
</xsl:template>

With nothing else you get a 1:1 copy of your first source file. Works with any type of XML. The merging part is file specific. Let's presume you have event elements with an event ID attribute. You do not want duplicate IDs. The template would look like this:

无需其他任何东西,您就可以获得第一个源文件的 1:1 副本。适用于任何类型的 XML。合并部分是特定于文件的。假设您有具有事件 ID 属性的事件元素。您不想要重复的 ID。模板如下所示:

 <xsl:template match="events">
    <xsl:variable name="allEvents" select="descendant::*" />
    <events>
        <!-- copies all events from the first file -->
        <xsl:apply-templates />
        <!-- Merge the new events in. You need to adjust the select clause -->
        <xsl:for-each select="$mDoc/logbook/server/events/event">
            <xsl:variable name="curID" select="@id" />
            <xsl:if test="not ($allEvents[@id=$curID]/@id = $curID)">
                <xsl:element name="event">
                    <xsl:apply-templates select="@*" />
                    <xsl:apply-templates />
                </xsl:element>
            </xsl:if>
        </xsl:for-each>
    </properties>
</xsl:template>

Of course you can compare other things like tag names etc. Also it is up to you how deep the merge happens. If you don't have a key to compare, the construct becomes easier e.g. for log:

当然,您可以比较标签名称等其他内容。此外,合并发生的深度取决于您。如果您没有要比较的键,则构造会变得更容易,例如日志:

 <xsl:template match="logs">
     <xsl:element name="logs">
          <xsl:apply-templates select="@*" />
          <xsl:apply-templates />
          <xsl:apply-templates select="$mDoc/logbook/server/logs/log" />
    </xsl:element>

To run XSLT in Java use this:

要在 Java 中运行 XSLT,请使用以下命令:

    Source xmlSource = new StreamSource(xmlFile);
    Source xsltSource = new StreamSource(xsltFile);
    Result xmlResult = new StreamResult(resultFile);
    TransformerFactory transFact = TransformerFactory.newInstance();
    Transformer trans = transFact.newTransformer(xsltSource);
    // Load Parameters if we have any
    if (ParameterMap != null) {
       for (Entry<String, String> curParam : ParameterMap.entrySet()) {
            trans.setParameter(curParam.getKey(), curParam.getValue());
       }
    }
    trans.transform(xmlSource, xmlResult);

or you download the Saxon SAX Parserand do it from the command line (Linux shell example):

或者您下载Saxon SAX Parser并从命令行执行此操作(Linux shell 示例):

#!/bin/bash
notify-send -t 500 -u low -i gtk-dialog-info "Transforming  with  into  ..."
# That's actually the only relevant line below
java -cp saxon9he.jar net.sf.saxon.Transform -t -s: -xsl: -o:
notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into  done!"

YMMV

青年会