多个线程在一个 DataTable C# 中填充它们的结果

Question

提问by Nidhi

I'm just beginning to learn the concept of threading, and I'm kind of stuck at this one problem, its driving me crazy....

我刚刚开始学习线程的概念，我有点被这个问题困住了，它让我发疯了......

What I actually need to accomplish -

我真正需要完成的事情-

I have some 300 text files in a local directory, that need to be parsed for specific values... After I find these "values" in each text file, I need to store them in a database.. So I followed the plain approach of accessing each text file in the directory - parsing and updating the resulting values as a row to a local DataTable, and when I'm done parsing all the files and storing 300 rows to the DataTable, I would do a SQLBulkCopy of DataTable to my database. This approach works fine except that it takes me about 10 minutes to run my code!

我在本地目录中有大约 300 个文本文件，需要解析特定的值......在每个文本文件中找到这些“值”后，我需要将它们存储在数据库中......所以我遵循了简单的方法访问目录中的每个文本文件 - 将结果值作为一行解析并更新到本地 DataTable，当我解析完所有文件并将 300 行存储到 DataTable 时，我会对我的 DataTable 执行 SQLBulkCopy数据库。这种方法效果很好，只是运行我的代码需要大约 10 分钟！

What I'm attempting to do now -

我现在正在尝试做的 -

Create a new thread for each file and keep the thread count below 4 at any given time... then each thread would parse through the file and return a row to update the local DataTable

为每个文件创建一个新线程，并在任何给定时间将线程数保持在 4 以下……然后每个线程将解析文件并返回一行以更新本地 DataTable

Where I'm stuck - I don't understand how to update this single Datatable that gets rows from multiple threads...

我被卡住的地方 - 我不明白如何更新这个从多个线程获取行的单个数据表......

Quite an explanation isn't it.. hope some one here can suggest a good idea for this...

很好的解释不是吗..希望这里有人可以为此提出一个好主意......

Thanks, Nidhi

谢谢，尼迪

Answer 1

采纳答案by Michael Haren

This will be much easier if you just let each of your four threads write to the database themselves. In this scenario you don't have to worry about threading (except for what files each thread works on) as each worker thread could maintain their own datatable and consume 25% of the files.

如果您让四个线程中的每一个都自己写入数据库，这会容易得多。在这种情况下，您不必担心线程（每个线程处理的文件除外），因为每个工作线程都可以维护自己的数据表并消耗 25% 的文件。

Alternatively, you can have a single datatable that all the threads use--just make sure to wrap accesses to it with a lock like so:

或者，您可以拥有一个所有线程都使用的单个数据表——只需确保使用锁来包装对它的访问，如下所示：

lock(YourTable.Rows.SyncRoot){
  // add rows to table
}

Of course this is all moot if the bottleneck is the disk, as @David B notes.

当然，如果瓶颈是磁盘，这一切都没有实际意义，正如@David B 所指出的那样。

Answer 2

回答by Gary

SQLBulkCopy is a big hammer for only 300 rows.

SQLBulkCopy 是一个只有 300 行的大锤子。

Check out Smart Thread Pool. This is an instance thread pool that you can limit to 4 threads very easily. Since you only have 300 rows consider post them directly to SQL in each thread rather than aggregating in you code.

查看智能线程池。这是一个实例线程池，您可以非常轻松地将其限制为 4 个线程。由于您只有 300 行，请考虑将它们直接发布到每个线程中的 SQL，而不是在您的代码中聚合。

Answer 3

回答by Zensar

As the others have pointed out, remember to lock your table before updating. C#:

正如其他人指出的那样，请记住在更新之前锁定您的表。C＃：

private object tableLock;

/*
Later in code.
*/

private void UpdateDataTable(object data)
{
    lock(tableLock)
    {
          //Add or update table rows
    }
}

As for methods of actually controlling and keeping the threads in line, just use a ThreadPool object, set the maximum threads to your limit, and the queuing can take care of things. For additional control you can toss in some logic that uses an array of WaitHandle objects. In fact that might actually be a good idea considering that you want to queue up 300 separate objects.

至于实际控制和保持线程对齐的方法，只需使用 ThreadPool 对象，将最大线程数设置为您的限制，排队就可以解决问题。对于额外的控制，您可以加入一些使用 WaitHandle 对象数组的逻辑。事实上，考虑到您要排列 300 个单独的对象，这实际上可能是一个好主意。

Answer 4

回答by John Saunders

What made you think that more threads would improve things? They probably won't.

是什么让你认为更多的线程会改善事情？他们可能不会。

I suggest you first get the program to work, then worry about getting it to work faster. Do it with only one thread.

我建议你先让程序运行，然后再担心让它运行得更快。只用一根线就可以做到。

Answer 5

回答by Jonathan Mitchem

As was somewhat pointed out, you need to examine exactly where your bottleneck is and why you're using threading.

正如有人指出的那样，您需要准确检查瓶颈在哪里以及为什么要使用线程。

By moving to multiple threads, you do have a potential for increased performance. However, if you're updating the same DataTable with each thread, you're limited by the DataTable. Only one thread can write to the DataTable at one time (which you control with a lock), so you're still fundamentally processing in sequence.

通过移动到多个线程，您确实有提高性能的潜力。但是，如果您为每个线程更新相同的 DataTable，则会受到 DataTable 的限制。一次只有一个线程可以写入 DataTable（您用锁控制），因此您基本上仍然是按顺序进行处理。

On the other hand, most databases are designed for multiple connections, running on multiple threads, and have been highly tuned for that purpose. If you want to still use multiple threads: let each thread have its own connection to the database, and do its own processing.

另一方面，大多数数据库是为多个连接而设计的，在多个线程上运行，并为此目的进行了高度调整。如果还想用多线程：让每个线程都有自己的数据库连接，自己做处理。

Now, depending on the kind of processing going on, your bottleneck may be in opening and processing the file, and not in the database update.

现在，根据正在进行的处理类型，您的瓶颈可能在于打开和处理文件，而不是数据库更新。

One way to split things up:

拆分事物的一种方法：

Put all the file names to be processed into a filename Queue.
Create a thread (or threads) to pull an item off the filename Queue, open and parse and process the file, and push the results into a result Queue.
Have another thread take the results from the result Queue, and insert them into the database.

将所有要处理的文件名放入一个文件名队列中。
创建一个线程（或多个线程）从文件名队列中拉出一个项目，打开并解析和处理文件，并将结果推送到结果队列中。
让另一个线程从结果队列中获取结果，并将它们插入到数据库中。

These can run simultaneously... the database won't be updated until there's something to update, and will just wait in the meantime.

这些可以同时运行......直到有东西要更新时数据库才会更新，并且会在此期间等待。

This approach lets you really know who is waiting on whom. If the read/process file part is slow, create more threads to do that. If the insert into database part is slow, create more threads to do that. The queues just need to be synchronized.

这种方法让你真正知道谁在等谁。如果读取/处理文件部分很慢，请创建更多线程来执行此操作。如果插入数据库部分很慢，请创建更多线程来执行此操作。队列只需要同步。

So, pseudocode:

所以，伪代码：

Queue<string> _filesToProcess = new Queue<string>();
Queue<string> _results = new Queue<string>();
Thread _fileProcessingThread = new Thread( ProcessFiles );
Thread _databaseUpdatingThread = new Thread( UpdateDatabase );
bool _finished = false;

static void Main()
{
    foreach( string fileName in GetFileNamesToProcess() )
    {
       _filesToProcess.Enqueue( fileName );
    }

    _fileProcessingThread.Start();
    _databaseUpdatingThread.Start();

    // if we want to wait until they're both finished
    _fileProcessingThread.Join();
    _databaseUpdatingThread.Join();

    Console.WriteLine( "Done" );
}

void ProcessFiles()
{
   bool filesLeft = true;

   lock( _filesToProcess ){ filesLeft = _filesToProcess.Count() > 0; }

   while( filesLeft )
   {
      string fileToProcess;
      lock( _filesToProcess ){ fileToProcess = _filesToProcess.Dequeue(); }

      string resultAsString = ProcessFileAndGetResult( fileToProcess );

      lock( _results ){ _results.Enqueue( resultAsString ); }

      Thread.Sleep(1); // prevent the CPU from being 100%

      lock( _filesToProcess ){ filesLeft = _filesToProcess.Count() > 0; }
   }

   _finished = true;
}

void UpdateDatabase()
{
   bool pendingResults = false;

   lock( _results ){ pendingResults = _results.Count() > 0; }

   while( !_finished || pendingResults )
   {
      if( pendingResults )
      {
         string resultsAsString;
         lock( _results ){ resultsAsString = _results.Dequeue(); }

         InsertIntoDatabase( resultsAsString ); // implement this however
      }

      Thread.Sleep( 1 ); // prevents the CPU usage from being 100%

      lock( _results ){ pendingResults = _results.Count() > 0; }
   }
}

I'm pretty sure there's ways to make that "better", but it should do the trick so you can read and process data while also adding completed data to the database, and take advantage of threading.

我很确定有办法让它“更好”，但它应该可以解决问题，这样您就可以读取和处理数据，同时将完成的数据添加到数据库中，并利用线程。

If you want another Thread to process files, or to update the database, just create a new Thread( MethodName ), and call Start().

如果您希望另一个线程处理文件或更新数据库，只需创建一个新线程（ MethodName ），然后调用 Start()。

It's not the simplest example, but I think it's thorough. You're synchronizing two queues, and you need to make sure each is locked before accessing. You're keeping track of when each thread should finish, and you have data being marshaled between threads, but never processed more than once, using Queues.

这不是最简单的例子，但我认为它是彻底的。您正在同步两个队列，并且需要确保每个队列在访问之前都已锁定。您正在跟踪每个线程应该何时完成，并且您在线程之间编组了数据，但使用队列从不处理超过一次。

Hope that helps.

希望有帮助。

多个线程在一个 DataTable C# 中填充它们的结果

提问by Nidhi

采纳答案by Michael Haren

回答by Gary

回答by Zensar

回答by John Saunders

回答by Jonathan Mitchem

相关推荐

最近更新

标签

多个线程在一个 DataTable C# 中填充它们的结果

提问by Nidhi

采纳答案by Michael Haren

回答by Gary

回答by Zensar

回答by John Saunders

回答by Jonathan Mitchem

相关推荐

C# 确定表单是否完全脱离屏幕

访问Linux /dev/USB作为标准文件与USB设备通信

C# 我可以在类库项目中创建数据库连接吗？

Linux httpd：无法可靠地确定服务器的完全限定域名，使用 127.0.0.1 作为 ServerName

相关推荐

最近更新

标签