C# 将多个文件合并为一个文件

Question

提问by Pratik

Code:

代码：

static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
    string[] fileAry = Directory.GetFiles(dirPath, filePattern);

    Console.WriteLine("Total File Count : " + fileAry.Length);

    using (TextWriter tw = new StreamWriter(destFile, true))
    {
        foreach (string filePath in fileAry)
        {
            using (TextReader tr = new StreamReader(filePath))
            {
                tw.WriteLine(tr.ReadToEnd());
                tr.Close();
                tr.Dispose();
            }
            Console.WriteLine("File Processed : " + filePath);
        }

        tw.Close();
        tw.Dispose();
    }
}

I need to optimize this as its extremely slow: takes 3 minutes for 45 files of average size 40 — 50 Mb XML file.

我需要对其进行优化，因为它非常慢：平均大小为 40 — 50 Mb XML 文件的 45 个文件需要 3 分钟。

Please note: 45 files of an average 45 MB is just one example, it can be nnumbers of files of msize, where nis in thousands & mcan be of average 128 Kb. In short, it can vary.

请注意：平均 45 MB 的 45 个文件只是一个例子，它可以是大小n文件的m数量，其中n以千为单位，m可以是平均 128 Kb。简而言之，它可以变化。

Could you please provide any views on optimization?

你能提供任何关于优化的意见吗？

Answer 1

采纳答案by Sergey Brunov

General answer

一般回答

Why not just use the Stream.CopyTo(Stream destination)method?

为什么不直接使用该Stream.CopyTo(Stream destination)方法？

private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
    string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
    Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
    using (var outputStream = File.Create(outputFilePath))
    {
        foreach (var inputFilePath in inputFilePaths)
        {
            using (var inputStream = File.OpenRead(inputFilePath))
            {
                // Buffer size can be passed as the second argument.
                inputStream.CopyTo(outputStream);
            }
            Console.WriteLine("The file {0} has been processed.", inputFilePath);
        }
    }
}

Buffer size adjustment

缓冲区大小调整

Please, note that the mentioned method is overloaded.

请注意，上述方法已重载。

There are two method overloads:

有两种方法重载：

The second method overload provides the buffer size adjustment through the bufferSizeparameter.

第二种方法重载通过bufferSize参数提供缓冲区大小调整。

Answer 2

回答by Sten Petrov

Several things you can do:

你可以做几件事：

I my experience the default buffer sizes can be increased with noticeable benefit up to about 120K, I suspect setting a large buffer on all streams will be the easiest and most noticeable performance booster:
```
new System.IO.FileStream("File.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read, 150000);
```
Use the Streamclass, not the StreamReaderclass.
Read contents into a large buffer, dump them in output stream at once — this will speed up small files operations.
No need of the redundant close/dispose: you have the usingstatement.

我的经验是默认缓冲区大小可以增加到大约 120K 的显着好处，我怀疑在所有流上设置一个大缓冲区将是最简单和最显着的性能提升：
```
new System.IO.FileStream("File.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read, 150000);
```
使用Stream类，而不是StreamReader类。
将内容读入一个大缓冲区，立即将它们转储到输出流中——这将加速小文件操作。
不需要多余的关闭/处理：你有using语句。

Answer 3

回答by Eren Ers?nmez

One option is to utilize the copycommand, and let it do what is does well.

一种选择是利用复制命令，让它做擅长的事情。

Something like:

就像是：

static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
    var cmd = new ProcessStartInfo("cmd.exe", 
        String.Format("/c copy {0} {1}", filePattern, destFile));
    cmd.WorkingDirectory = dirPath;
    cmd.UseShellExecute = false;
    Process.Start(cmd);
}

Answer 4

回答by paparazzo

I would use a BlockingCollection to read so you can read and write concurrently.
Clearly should write to a separate physical disk to avoid hardware contention. This code will preserve order.
Read is going to be faster than write so no need for parallel read.
Again since read is going to be faster limit the size of the collection so read does not get farther ahead of write than it needs to.
A simple task to read the single next in parallel while writing the current has the problem of different file sizes - write a small file is faster than read a big.

我会使用 BlockingCollection 来读取，以便您可以同时读取和写入。
显然应该写入单独的物理磁盘以避免硬件争用。此代码将保留顺序。
读取将比写入更快，因此不需要并行读取。
同样，由于读取速度会更快，因此限制了集合的大小，因此读取不会比写入更早。
在写入当前文件的同时并行读取单个 next 的简单任务存在文件大小不同的问题 - 写入小文件比读取大文件快。

I use this pattern to read and parse text on T1 and then insert to SQL on T2.

我使用这种模式在 T1 上读取和解析文本，然后在 T2 上插入到 SQL。

public void WriteFiles()
{
    using (BlockingCollection<string> bc = new BlockingCollection<string>(10))
    {
        // play with 10 if you have several small files then a big file
        // write can get ahead of read if not enough are queued

        TextWriter tw = new StreamWriter(@"c:\temp\alltext.text", true);
        // clearly you want to write to a different phyical disk 
        // ideally write to solid state even if you move the files to regular disk when done
        // Spin up a Task to populate the BlockingCollection
        using (Task t1 = Task.Factory.StartNew(() =>
        {
            string dir = @"c:\temp\";
            string fileText;      
            int minSize = 100000; // play with this
            StringBuilder sb = new StringBuilder(minSize);
            string[] fileAry = Directory.GetFiles(dir, @"*.txt");
            foreach (string fi in fileAry)
            {
                Debug.WriteLine("Add " + fi);
                fileText = File.ReadAllText(fi);
                //bc.Add(fi);  for testing just add filepath
                if (fileText.Length > minSize)
                {
                    if (sb.Length > 0)
                    { 
                       bc.Add(sb.ToString());
                       sb.Clear();
                    }
                    bc.Add(fileText);  // could be really big so don't hit sb
                }
                else
                {
                    sb.Append(fileText);
                    if (sb.Length > minSize)
                    {
                        bc.Add(sb.ToString());
                        sb.Clear();
                    }
                }
            }
            if (sb.Length > 0)
            {
                bc.Add(sb.ToString());
                sb.Clear();
            }
            bc.CompleteAdding();
        }))
        {

            // Spin up a Task to consume the BlockingCollection
            using (Task t2 = Task.Factory.StartNew(() =>
            {
                string text;
                try
                {
                    while (true)
                    {
                        text = bc.Take();
                        Debug.WriteLine("Take " + text);
                        tw.WriteLine(text);                  
                    }
                }
                catch (InvalidOperationException)
                {
                    // An InvalidOperationException means that Take() was called on a completed collection
                    Debug.WriteLine("That's All!");
                    tw.Close();
                    tw.Dispose();
                }
            }))

                Task.WaitAll(t1, t2);
        }
    }
}

BlockingCollection Class

BlockingCollection 类

Answer 5

回答by Miguelito

    // Binary File Copy
    public static void mergeFiles(string strFileIn1, string strFileIn2, string strFileOut, out string strError)
    {
        strError = String.Empty;
        try
        {
            using (FileStream streamIn1 = File.OpenRead(strFileIn1))
            using (FileStream streamIn2 = File.OpenRead(strFileIn2))
            using (FileStream writeStream = File.OpenWrite(strFileOut))
            {
                BinaryReader reader = new BinaryReader(streamIn1);
                BinaryWriter writer = new BinaryWriter(writeStream);

                // create a buffer to hold the bytes. Might be bigger.
                byte[] buffer = new Byte[1024];
                int bytesRead;

                // while the read method returns bytes keep writing them to the output stream
                while ((bytesRead =
                        streamIn1.Read(buffer, 0, 1024)) > 0)
                {
                    writeStream.Write(buffer, 0, bytesRead);
                }
                while ((bytesRead =
                        streamIn2.Read(buffer, 0, 1024)) > 0)
                {
                    writeStream.Write(buffer, 0, bytesRead);
                }
            }
        }
        catch (Exception ex)
        {
            strError = ex.Message;
        }
    }

Answer 6

回答by kashified

Tried solution posted by sergey-brunovfor merging 2GB file. System took around 2 GB of RAM for this work. I have made some changes for more optimization and it now takes 350MB RAM to merge 2GB file.

sergey-brunov发布的合并 2GB 文件的尝试解决方案。系统为此工作占用了大约 2 GB 的 RAM。我进行了一些更改以进行更多优化，现在需要 350MB RAM 来合并 2GB 文件。

private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
        {
            string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
            Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
            foreach (var inputFilePath in inputFilePaths)
            {
                using (var outputStream = File.AppendText(outputFilePath))
                {
                    // Buffer size can be passed as the second argument.
                    outputStream.WriteLine(File.ReadAllText(inputFilePath));
                    Console.WriteLine("The file {0} has been processed.", inputFilePath);

                }
            }
        }

C# 将多个文件合并为一个文件

提问by Pratik

采纳答案by Sergey Brunov

General answer

一般回答

Buffer size adjustment

缓冲区大小调整

回答by Sten Petrov

回答by Eren Ers?nmez

回答by paparazzo

回答by Miguelito

回答by kashified

相关推荐

最近更新

标签

C# 将多个文件合并为一个文件

提问by Pratik

采纳答案by Sergey Brunov

General answer

一般回答

Buffer size adjustment

缓冲区大小调整

回答by Sten Petrov

回答by Eren Ers?nmez

回答by paparazzo

回答by Miguelito

回答by kashified

相关推荐

C# RestRequest 类在哪里？

C# 防止对象图序列化的循环引用

C# 删除 XML 节点集合中的空/空白元素

C# 反转数组中的元素

相关推荐

最近更新

标签