C# 如何在不同的 CPU 内核上生成线程?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I spawn threads on different CPU cores?
提问by Tom Kidd
Let's say I had a program in C# that did something computationally expensive, like encoding a list of WAV files into MP3s. Ordinarily I would encode the files one at a time, but let's say I wanted the program to figure out how many CPU cores I had and spin up an encoding thread on each core. So, when I run the program on a quad core CPU, the program figures out it's a quad core CPU, figures out there are four cores to work with, then spawns four threads for the encoding, each of which is running on its own separate CPU. How would I do this?
假设我有一个 C# 程序,它做了一些计算量很大的事情,比如将 WAV 文件列表编码为 MP3。通常我会一次编码一个文件,但假设我希望程序计算出我有多少个 CPU 内核并在每个内核上启动一个编码线程。因此,当我在四核 CPU 上运行该程序时,该程序确定它是一个四核 CPU,确定有四个内核可以使用,然后产生四个用于编码的线程,每个线程都独立运行中央处理器。我该怎么做?
And would this be any different if the cores were spread out across multiple physical CPUs? As in, if I had a machine with two quad core CPUs on it, are there any special considerations or are the eight cores across the two dies considered equal in Windows?
如果内核分布在多个物理 CPU 上,情况会有什么不同吗?例如,如果我的机器上有两个四核 CPU,是否有任何特殊考虑,或者在 Windows 中两个芯片的八个内核是否相等?
采纳答案by Jorge Córdoba
Don't bother doing that.
别费心去做。
Instead use the Thread Pool. The thread pool is a mechanism (actually a class) of the framework that you can query for a new thread.
而是使用线程池。线程池是框架的一种机制(实际上是一个类),可以查询新的线程。
When you ask for a new thread it will either give you a new one or enqueue the work until a thread get freed. In that way the framework is in charge on deciding wether it should create more threads or not depending on the number of present CPUs.
当您请求一个新线程时,它要么会给您一个新线程,要么将工作排入队列,直到线程被释放。通过这种方式,框架负责根据当前 CPU 的数量决定是否应该创建更多线程。
Edit: In addition, as it has been already mentioned, the OS is in charge of distributing the threads among the different CPUs.
编辑:此外,正如已经提到的,操作系统负责在不同的 CPU 之间分配线程。
回答by Adam Haile
Where each thread goes is generally handled by the OS itself...so generate 4 threads on a 4 core system and the OS will decide which cores to run each on, which will usually be 1 thread on each core.
每个线程的去向通常由操作系统本身处理……因此在 4 核系统上生成 4 个线程,操作系统将决定每个内核运行哪个内核,通常每个内核上有 1 个线程。
回答by wvdschel
It is the operating system's job to split threads across different cores, and it will do so when automatically when your threads are using a lot of CPU time. Don't worry about that. As for finding out how many cores your user has, try Environment.ProcessorCount
in C#.
将线程拆分到不同的内核是操作系统的工作,当您的线程使用大量 CPU 时间时,它会自动这样做。别担心。至于找出您的用户有多少个内核,请Environment.ProcessorCount
在 C# 中尝试。
回答by Eric Z Beard
You shouldn't have to worry about doing this yourself. I have multithreaded .NET apps running on dual-quad machines, and no matter how the threads are started, whether via the ThreadPool or manually, I see a nice even distribution of work across all cores.
您不必担心自己这样做。我有在双四机上运行的多线程 .NET 应用程序,无论线程如何启动,无论是通过 ThreadPool 还是手动启动,我都看到了跨所有内核的良好均匀分布。
回答by Will Dean
One of the reasons you should not (as has been said) try to allocated this sort of stuff yourself, is that you just don't have enough information to do it properly, particularly into the future with NUMA, etc.
您不应该(如前所述)尝试自己分配此类东西的原因之一是,您没有足够的信息来正确地进行分配,尤其是在未来使用 NUMA 等时。
If you have a thread read-to-run, and there's a core idle, the kernel willrun your thread, don't worry.
如果您有一个要运行的线程,并且有一个内核空闲,内核将运行您的线程,请不要担心。
回答by Peter Meyer
In the case of managed threads, the complexity of doing this is a degree greater than that of native threads. This is because CLR threads are not directly tied to a native OS thread. In other words, the CLR can switch a managedthread from native thread to native thread as it sees fit. The function Thread.BeginThreadAffinityis provided to place a managed thread in lock-step with a native OS thread. At that point, you could experiment with using native API's to give the underlying native thread processor affinity. As everyone suggests here, this isn't a very good idea. In fact there is documentationsuggesting that threads can receive less processing time if they are restricted to a single processor or core.
在托管线程的情况下,这样做的复杂性比本地线程高出一个程度。这是因为 CLR 线程不直接绑定到本机 OS 线程。换句话说,CLR 可以将托管线程从本机线程切换到本机线程,因为它认为合适。提供了Thread.BeginThreadAffinity函数以将托管线程与本机 OS 线程保持锁步。那时,您可以尝试使用本机 API 来赋予底层本机线程处理器亲和性。正如这里的每个人所建议的那样,这不是一个好主意。事实上,有文档表明,如果线程被限制在单个处理器或内核上,它们的处理时间会更少。
You can also explore the System.Diagnostics.Processclass. There you can find a function to enumerate a process' threads as a collection of ProcessThreadobjects. This class has methods to set ProcessorAffinity or even set a preferredprocessor -- not sure what that is.
您还可以探索System.Diagnostics.Process类。在那里你可以找到一个函数来枚举进程的线程作为ProcessThread对象的集合。此类具有设置 ProcessorAffinity 甚至设置首选处理器的方法——不确定那是什么。
Disclaimer: I've experienced a similar problem where I thought the CPU(s) were under utilized and researched a lot of this stuff; however, based on all that I read, it appeared that is wasn't a very good idea, as evidenced by the comments posted here as well. However, it's still interesting and a learning experience to experiment.
免责声明:我遇到过类似的问题,我认为 CPU 未得到充分利用并研究了很多此类问题;但是,根据我阅读的所有内容,这似乎不是一个好主意,此处发布的评论也证明了这一点。然而,它仍然很有趣,也是一种尝试的学习经验。
回答by Joe Erickson
It is not necessarily as simple as using the thread pool.
不一定像使用线程池那么简单。
By default, the thread pool allocates multiple threads for each CPU. Since every thread which gets involved in the work you are doing has a cost (task switching overhead, use of the CPU's very limited L1, L2 and maybe L3 cache, etc...), the optimal number of threads to use is <= the number of available CPU's - unless each thread is requesting services from other machines - such as a highly scalable web service. In some cases, particularly those which involve more hard disk reading and writing than CPU activity, you can actually be better off with 1 thread than multiple threads.
默认情况下,线程池为每个 CPU 分配多个线程。由于参与您正在执行的工作的每个线程都有成本(任务切换开销,使用 CPU 非常有限的 L1、L2 和 L3 缓存等...),因此使用的最佳线程数是 <=可用 CPU 的数量 - 除非每个线程都从其他机器请求服务 - 例如高度可扩展的 Web 服务。在某些情况下,特别是那些涉及比 CPU 活动更多的硬盘读取和写入的情况,实际上,使用 1 个线程比使用多个线程更好。
For most applications, and certainly for WAV and MP3 encoding, you should limit the number of worker threads to the number of available CPU's. Here is some C# code to find the number of CPU's:
对于大多数应用程序,当然对于 WAV 和 MP3 编码,您应该将工作线程的数量限制为可用 CPU 的数量。这是一些用于查找 CPU 数量的 C# 代码:
int processors = 1;
string processorsStr = System.Environment.GetEnvironmentVariable("NUMBER_OF_PROCESSORS");
if (processorsStr != null)
processors = int.Parse(processorsStr);
Unfortunately, it is not as simple as limiting yourself to the number of CPU's. You also have to take into account the performance of the hard disk controller(s) and disk(s).
不幸的是,这并不像限制 CPU 数量那么简单。您还必须考虑硬盘控制器和磁盘的性能。
The only way you can really find the optimal number of threads is trial an error. This is particularly true when you are using hard disks, web services and such. With hard disks, you might be better off not using all four processers on you quad processor CPU. On the other hand, with some web services, you might be better off making 10 or even 100 requests per CPU.
真正找到最佳线程数的唯一方法是试错。当您使用硬盘、Web 服务等时尤其如此。对于硬盘,最好不要在四处理器 CPU 上使用所有四个处理器。另一方面,对于某些 Web 服务,每个 CPU 发出 10 甚至 100 个请求可能会更好。
回答by Amit Puri
you cannot do this, as only operating system has the privileges to do it. If you will decide it.....then it will be difficult to code applications. Because then you also need to take care for inter-processor communication. critical sections. for each application you have to create you own semaphores or mutex......to which operating system gives a common solution by doing it itself.......
你不能这样做,因为只有操作系统才有权限这样做。如果您决定……那么编写应用程序将很困难。因为那样你还需要注意处理器间的通信。临界区。对于每个应用程序,您必须创建自己的信号量或互斥锁……操作系统通过自己做来提供通用的解决方案……
回答by Mantosh Kumar
You can definitely do this by writing the routine inside your program.
您绝对可以通过在程序中编写例程来做到这一点。
However you should not try to do it, since the Operating System is the best candidate to manage these stuff. I mean user mode program should not do try to do it.
但是,您不应该尝试这样做,因为操作系统是管理这些东西的最佳人选。我的意思是用户模式程序不应该尝试这样做。
However, sometimes, it can be done (for really advanced user) to achieve the load balancing and even to find out true multi thread multi core problem (data racing/cache coherence...) as different threads would be truly executing on different processor.
然而,有时,它可以(对于真正的高级用户)实现负载平衡,甚至找出真正的多线程多核问题(数据竞争/缓存一致性...),因为不同的线程将真正在不同的处理器上执行.
Having said that, if you still want to achieve we can do it in the following way. I am providing you the pseudo code for(Windows OS), however they could easily be done on Linux as well.
话虽如此,如果您还想实现我们可以通过以下方式来实现。我为您提供了(Windows 操作系统)的伪代码,但是它们也可以在 Linux 上轻松完成。
#define MAX_CORE 256
processor_mask[MAX_CORE] = {0};
core_number = 0;
Call GetLogicalProcessorInformation();
// From Here we calculate the core_number and also we populate the process_mask[] array
// which would be used later on to set to run different threads on different CORES.
for(j = 0; j < THREAD_POOL_SIZE; j++)
Call SetThreadAffinityMask(hThread[j],processor_mask[j]);
//hThread is the array of handles of thread.
//Now if your number of threads are higher than the actual number of cores,
// you can use reset the counters(j) once you reach to the "core_number".
After the above routine is called, the threads would always be executing in the following manner:
调用上述例程后,线程将始终以以下方式执行:
Thread1-> Core1
Thread2-> Core2
Thread3-> Core3
Thread4-> Core4
Thread5-> Core5
Thread6-> Core6
Thread7-> Core7
Thread8-> Core8
Thread9-> Core1
Thread10-> Core2
...............
For more information, please refer to manual/MSDN to know more about these concepts.
有关更多信息,请参阅手册/MSDN 以了解有关这些概念的更多信息。
回答by AlexDev
Although I agree with most of the answers here, I think it's worth it to add a new consideration: Speedstep technology.
虽然我同意这里的大部分答案,但我认为值得添加一个新的考虑:Speedstep 技术。
When running a CPU intensive, single threaded job on a multi-core system, in my case a Xeon E5-2430 with 6 real cores (12 with HT) under windows server 2012, the job got spread out among all the 12 cores, using around 8.33% of each core and never triggering a speed increase. The CPU remained at 1.2 GHz.
在多核系统上运行 CPU 密集型单线程作业时,在我的情况下,Windows Server 2012 下的 Xeon E5-2430 具有 6 个真实内核(12 个带有 HT),该作业分布在所有 12 个内核中,使用每个核心的大约 8.33% 并且从未触发速度增加。CPU 保持在 1.2 GHz。
When I set the thread affinity to a specific core, it used ~100% of that core, causing the CPU to max out at 2.5 GHz, more than doubling the performance.
当我将线程关联设置为特定核心时,它使用了大约 100% 的核心,导致 CPU 最大频率为 2.5 GHz,性能提高了一倍多。
This is the program I used, which just loops increasing a variable. When called with -a, it will set the affinity to core 1. The affinity part was based on this post.
这是我使用的程序,它只是循环增加一个变量。当使用 -a 调用时,它会将亲和性设置为核心 1。亲和性部分基于这篇文章。
using System;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Threading;
namespace Esquenta
{
class Program
{
private static int numThreads = 1;
static bool affinity = false;
static void Main(string[] args)
{
if (args.Contains("-a"))
{
affinity = true;
}
if (args.Length < 1 || !int.TryParse(args[0], out numThreads))
{
numThreads = 1;
}
Console.WriteLine("numThreads:" + numThreads);
for (int j = 0; j < numThreads; j++)
{
var param = new ParameterizedThreadStart(EsquentaP);
var thread = new Thread(param);
thread.Start(j);
}
}
static void EsquentaP(object numero_obj)
{
int i = 0;
DateTime ultimo = DateTime.Now;
if(affinity)
{
Thread.BeginThreadAffinity();
CurrentThread.ProcessorAffinity = new IntPtr(1);
}
try
{
while (true)
{
i++;
if (i == int.MaxValue)
{
i = 0;
var lps = int.MaxValue / (DateTime.Now - ultimo).TotalSeconds / 1000000;
Console.WriteLine("Thread " + numero_obj + " " + lps.ToString("0.000") + " M loops/s");
ultimo = DateTime.Now;
}
}
}
finally
{
Thread.EndThreadAffinity();
}
}
[DllImport("kernel32.dll")]
public static extern int GetCurrentThreadId();
[DllImport("kernel32.dll")]
public static extern int GetCurrentProcessorNumber();
private static ProcessThread CurrentThread
{
get
{
int id = GetCurrentThreadId();
return Process.GetCurrentProcess().Threads.Cast<ProcessThread>().Single(x => x.Id == id);
}
}
}
}
And the results:
结果:
Processor speed, as shown by Task manager, similar to what CPU-Z reports:
处理器速度,如任务管理器所示,类似于 CPU-Z 报告的: