在 C# 中使用多线程加速循环(问题)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/100291/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Speed up loop using multithreading in C# (Question)
提问by Lukas ?alkauskas
Imagine I have an function which goes through one million/billion strings and checks smth in them.
想象一下,我有一个函数,它遍历一百万/十亿个字符串并检查其中的 smth。
f.ex:
例如:
foreach (String item in ListOfStrings)
{
result.add(CalculateSmth(item));
}
it consumes lot's of time, because CalculateSmth is very time consuming function.
它消耗了很多时间,因为CalculateSmth 是非常耗时的函数。
I want to ask: how to integrate multithreading in this kinda process?
我想问:在这种过程中如何集成多线程?
f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.
f.ex:我想启动 5 个线程,每个线程都返回一些结果,直到列表中有项目为止。
Maybe anyone can show some examples or articles..
也许任何人都可以展示一些例子或文章..
Forgot to mention I need it in .NET 2.0
忘了提到我在 .NET 2.0 中需要它
采纳答案by Tobi
You could try the Parallel extensions(part of .NET 4.0)
您可以尝试并行扩展(.NET 4.0 的一部分)
These allow you to write something like:
这些允许您编写如下内容:
Parallel.Foreach (ListOfStrings, (item) =>
result.add(CalculateSmth(item));
);
Of course result.add would need to be thread safe.
当然 result.add 需要是线程安全的。
回答by Mats Fredriksson
Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.
并不是说我现在这里有什么好文章,而是您想要做的是带有线程池的生产者-消费者。
The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.
生产者循环并创建任务(在这种情况下可能只是将列表或堆栈中的项目排队)。例如,消费者是五个线程,它们从堆栈中读取一项,通过计算来消耗它,然后将它存储在其他地方。
This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.
通过这种方式,多线程仅限于这五个线程,并且它们都将有工作要做,直到堆栈为空。
Things to think about:
需要考虑的事情:
- Put protection on the input and output list, such as a mutex.
- If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
- Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.
- 对输入和输出列表进行保护,例如互斥锁。
- 如果顺序很重要,请确保保持输出顺序。一个例子可能是将它们存储在 SortedList 或类似的东西中。
- 确保CalculateSmth 是线程安全的,它不使用任何全局状态。
回答by Mats Wiklander
The first question you must answer is whether you should be using threading
您必须回答的第一个问题是您是否应该使用线程
If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.
如果您的函数 CalculateSmth() 基本上是 CPU 密集型的,即 CPU 使用率很高并且基本上没有 I/O 使用率,那么我很难看出使用线程的意义,因为线程将竞争相同的资源,在这种情况下是 CPU。
If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.
如果您的CalculateSmth() 同时使用CPU 和I/O,那么使用线程可能是一个重点。
I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.
我完全同意对我的回答的评论。我做出了一个错误的假设,即我们谈论的是单核 CPU,但现在我们有多核 CPU,我的错。
回答by noocyte
The Parallel extensions is cool, but this can also be done just by using the threadpool like this:
Parallel 扩展很酷,但这也可以通过像这样使用线程池来完成:
using System.Collections.Generic;
using System.Threading;
namespace noocyte.Threading
{
class CalcState
{
public CalcState(ManualResetEvent reset, string input) {
Reset = reset;
Input = input;
}
public ManualResetEvent Reset { get; private set; }
public string Input { get; set; }
}
class CalculateMT
{
List<string> result = new List<string>();
List<ManualResetEvent> events = new List<ManualResetEvent>();
private void Calc() {
List<string> aList = new List<string>();
aList.Add("test");
foreach (var item in aList)
{
CalcState cs = new CalcState(new ManualResetEvent(false), item);
events.Add(cs.Reset);
ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
}
WaitHandle.WaitAll(events.ToArray());
}
private void Calculate(object s)
{
CalcState cs = s as CalcState;
cs.Reset.Set();
result.Add(cs.Input);
}
}
}
回答by slim
Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.
请注意,并发并不会神奇地为您提供更多资源。您需要确定是什么导致CalculateSmth 变慢。
For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)
例如,如果它受 CPU 限制(并且您在单核上),那么相同数量的 CPU 滴答将进入代码,无论您是顺序执行还是并行执行。另外,您会从管理线程中获得一些开销。相同的论点适用于其他约束(例如 I/O)
You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.
如果CalculateSmth 在其执行期间使资源空闲,您将仅在此中获得性能提升,该资源可被另一个实例使用。这并不少见。例如,如果任务涉及 IO,然后是一些 CPU 内容,那么进程 1 可能正在处理 CPU 内容,而进程 2 正在执行 IO。正如 mats 指出的那样,如果您拥有基础设施,则生产者-消费者单元链可以实现这一目标。
回答by Hallgrim
You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:
您需要拆分要并行执行的工作。以下是如何将工作一分为二的示例:
List<string> work = (some list with lots of strings)
// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
if (i % 2 == 0)
{
even.Add(work[i]);
}
else
{
odd.Add(work[i]);
}
}
// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };
List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };
// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);
// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);
// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);
return allResults;