C# 如何基于节点将一个 XML 文件拆分为多个 XML 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14455639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 11:52:36  来源:igfitidea点击:

How to Split an XML file into multiple XML Files based on nodes

c#xmlsplit

提问by Nithesh Narayanan

I have an XML file as follows

我有一个 XML 文件如下

<?xml version="1.0>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>

  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>

  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

I want to split the xml file int o three. According to its nodes

我想将 xml 文件拆分为三个。根据其节点

File 1:

文件 1:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt1</id>
</CustomTextBox>

File 2:

文件2:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt2</id>
</CustomTextBox>

File 3:

文件 3:

<?xml version="1.0>
<AllControlsCount>
  <Width>0</Width>
  <id>ControlsID</id>
</AllControlsCount>

Also the nodes are dynamic, they may change. How can I split this xml file as multiple according to the nodes. If anybody knows please share.

节点也是动态的,它们可能会改变。如何根据节点将这个 xml 文件拆分为多个。如果有人知道请分享。

采纳答案by mipe34

Try LinqToXml:

试试LinqToXml

var xDoc = XDocument.Parse(Resource1.XMLFile1); // loading source xml
var xmls = xDoc.Root.Elements().ToArray(); // split into elements

for(int i = 0;i< xmls.Length;i++)
{
    // write each element into different file
    using (var file = File.CreateText(string.Format("xml{0}.xml", i + 1)))
    {
        file.Write(xmls[i].ToString());
    }
}

It will take all elements defined inside the root element and write its content into separate files.

它将获取根元素中定义的所有元素,并将其内容写入单独的文件。

回答by Sergey Berezovskiy

With Linq to Xml its even simpler - you can use XElement.Savemethod to save any element to separate xml file:

使用 Linq to Xml 更简单 - 您可以使用XElement.Save方法将任何元素保存到单独的 xml 文件:

XDocument xdoc = XDocument.Load(path_to_xml);
int index = 0;
foreach (var element in xdoc.Root.Elements())
    element.Save(++index + ".xml");

Or one line

或者一行

XDocument.Load(path_to_xml).Root.Elements()
         .Select((e, i) => new { Element = e, File = ++i + ".xml" })
         .ToList().ForEach(x => x.Element.Save(x.File));

回答by Legoless

You can use XmlTextReaderand XmlWriterclasses to accomplish what you wish. But you need to know where you need to start creating new XMLfiles. Looking at your example, you wish to split each node contained in the root node.

您可以使用XmlTextReaderXmlWriter类来完成您的愿望。但是您需要知道从哪里开始创建新的XML文件。查看您的示例,您希望拆分根节点中包含的每个节点。

That means that once you start reading the XML file, you need to ensure that you are inside of the root node, then you need to follow how deep into the XML you are, so you can close the file when you reach next node in the root node.

这意味着一旦您开始读取 XML 文件,您需要确保您在根节点内,然后您需要跟踪您在 XML 中的深度,以便在到达下一个节点时关闭文件根节点。

See this for example - I read XML from file.xml and open XML writer. When I reach first node contained in the root node, I start writing the elements.

例如,参见这个 - 我从 file.xml 读取 XML 并打开 XML 编写器。当我到达根节点中包含的第一个节点时,我开始编写元素。

I remember the depth in variable "treeDepth", which represents the XML tree structure depth.

我记得变量“treeDepth”中的深度,它表示 XML 树结构深度。

Based on currently read node, I do an action. When I reach the End element that has tree depth 1, it means I am again in the root node, so I close the current XML file and open new one.

基于当前读取的节点,我执行一个操作。当我到达树深度为 1 的 End 元素时,这意味着我又回到了根节点,所以我关闭了当前的 XML 文件并打开了新的文件。

XmlTextReader reader = new XmlTextReader ("file.xml");

XmlWriter writer = XmlWriter.Create("first_file.xml")
writer.WriteStartDocument();

int treeDepth = 0;

while (reader.Read()) 
{
    switch (reader.NodeType) 
    {
        case XmlNodeType.Element:

            //
            // Move to parsing or skip the root node
            //

            if (treeDepth > 0)
                writer.WriteStartElement(reader.Name);

            treeDepth++;


            break;
  case XmlNodeType.Text:

            //
            // Write text here
            //

            writer.WriteElementString (reader.Value);

            break;
  case XmlNodeType.EndElement:

            //
            // Close the end element, open new file
            //

            if (treeDepth == 1)
            {
                writer.WriteEndDocument();
                writer = new XmlWriter("file2.xml");
                writer.WriteStartDocument();
            }

            treeDepth--;

            break;
    }
}

writer.WriteEndDocument();

Note that this code does NOT entirely solve your problem, but merely explains the logic needed to solve it completely.

请注意,此代码并不能完全解决您的问题,而只是解释了完全解决问题所需的逻辑。

For more help on XML readers and writers read following links:

有关 XML 读取器和写入器的更多帮助,请阅读以下链接:

http://support.microsoft.com/kb/307548

http://support.microsoft.com/kb/307548

http://www.dotnetperls.com/xmlwriter

http://www.dotnetperls.com/xmlwriter

回答by VinceL

I took Legoless' answer and expanded it to make a version that worked for me and so am sharing it. For my needs, I needed to split upon multiple entries per file, rather than just the single entry per file that is shown in the original question and so that means I needed to it to preserve the higher level elements in order to ensure valid resulting xml files.

我接受了 Legoless 的回答并将其扩展为一个对我有用的版本,所以我分享了它。根据我的需要,我需要拆分每个文件的多个条目,而不仅仅是原始问题中显示的每个文件的单个条目,因此这意味着我需要保留更高级别的元素以确保有效的结果 xml文件。

So you supply the level you want to split on and the number of entries per file that you want.

因此,您提供要拆分的级别以及您想要的每个文件的条目数。

public class XMLFileManager
{        

    public List<string> SplitXMLFile(string fileName, int startingLevel, int numEntriesPerFile)
    {
        List<string> resultingFilesList = new List<string>();

        XmlReaderSettings readerSettings = new XmlReaderSettings();
        readerSettings.DtdProcessing = DtdProcessing.Parse;
        XmlReader reader = XmlReader.Create(fileName, readerSettings);

        XmlWriter writer = null;
        int fileNum = 1;
        int entryNum = 0;
        bool writerIsOpen = false;
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.NewLineOnAttributes = true;

        Dictionary<int, XmlNodeItem> higherLevelNodes = new Dictionary<int, XmlNodeItem>();
        int hlnCount = 0;

        string fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
        resultingFilesList.Add(fileIncrementedName);
        writer = XmlWriter.Create(fileIncrementedName, settings);
        writerIsOpen = true;
        writer.WriteStartDocument();

        int treeDepth = 0;

        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:                        

                    treeDepth++;

                    if (treeDepth == startingLevel)
                    {
                        entryNum++;
                        if (entryNum == 1)
                        {                                
                            if (fileNum > 1)
                            {
                                fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
                                resultingFilesList.Add(fileIncrementedName);
                                writer = XmlWriter.Create(fileIncrementedName, settings);
                                writerIsOpen = true;
                                writer.WriteStartDocument();
                                for (int d = 1; d <= higherLevelNodes.Count; d++)
                                {
                                    XmlNodeItem xni = higherLevelNodes[d];
                                    switch (xni.XmlNodeType)
                                    {
                                        case XmlNodeType.Element:
                                            writer.WriteStartElement(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Text:
                                            writer.WriteString(xni.NodeValue);
                                            break;
                                        case XmlNodeType.CDATA:
                                            writer.WriteCData(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Comment:
                                            writer.WriteComment(xni.NodeValue);
                                            break;
                                        case XmlNodeType.EndElement:
                                            writer.WriteEndElement();
                                            break;
                                    }
                                }
                            }
                        }
                    }

                    if (writerIsOpen)
                    {
                        writer.WriteStartElement(reader.Name);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Element;
                        xni.NodeValue = reader.Name;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Text:

                    if (writerIsOpen)
                    {
                        writer.WriteString(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Text;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.CDATA:

                    if (writerIsOpen)
                    {
                        writer.WriteCData(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.CDATA;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Comment:

                    if (writerIsOpen)
                    {
                        writer.WriteComment(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Comment;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.EndElement:

                    if (entryNum == numEntriesPerFile && treeDepth == startingLevel || treeDepth==1)
                    {
                        if (writerIsOpen)
                        {
                            fileNum++;
                            writer.WriteEndDocument();
                            writer.Close();
                            writerIsOpen = false;
                            entryNum = 0;
                        }                            
                    }
                    else
                    {
                        if (writerIsOpen)
                        {
                            writer.WriteEndElement();
                        }

                        if (treeDepth < startingLevel)
                        {
                            hlnCount++;
                            XmlNodeItem xni = new XmlNodeItem();
                            xni.XmlNodeType = XmlNodeType.EndElement;
                            xni.NodeValue = string.Empty;
                            higherLevelNodes.Add(hlnCount, xni);
                        }
                    }

                    treeDepth--;

                    break;
            }
        }

        return resultingFilesList;
    }

    private string GetIncrementedFileName(string fileName, int fileNum)
    {
        return fileName.Replace(".xml", "") + "_" + fileNum + "_" + ".xml";
    }
}

public class XmlNodeItem
{        
    public XmlNodeType XmlNodeType { get; set; }
    public string NodeValue { get; set; }
}

Sample Usage:

示例用法:

int startingLevel = 2; //EMR is level 1, while the entries of CustomTextBox and AllControlsCount 
                       //are at Level 2. The question wants to split on those Level 2 items 
                       //and so this parameter is set to 2.
int numEntriesPerFile = 1;  //Question wants 1 entry per file which will result in 3 files,  
                            //each with one entry.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("before_split.xml", startingLevel, numEntriesPerFile);

Results when used against XML file in the question:

对问题中的 XML 文件使用时的结果:

File 1:

文件 1:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>
</EMR>

File 2:

文件2:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>
</EMR>

File 3:

文件 3:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

Another example with greater depth of levels and showing multiple entries per file:

另一个具有更深层次并显示每个文件的多个条目的示例:

int startingLevel = 4; //splitting on the 4th level down which is <ITEM>
int numEntriesPerFile = 2;//2 enteries per file. If instead you used 3, then the result 
                          //would be 3 entries in the first file and 1 entry in the second file.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("another_example.xml", startingLevel, numEntriesPerFile);

Original File:

原始文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>  
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>      
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

Resulting Files:

结果文件:

First File:

第一个文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

Second File:

第二个文件:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>