C++ 如何在 Windows、Mac 和 Linux 上检测物理处理器/内核的数量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2901694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 11:26:44  来源:igfitidea点击:

How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux

c++windowsmacosassemblyhyperthreading

提问by HTASSCPP

I have a multi threaded c++ application that runs on Windows, Mac and a few Linux flavors.

我有一个在 Windows、Mac 和一些 Linux 版本上运行的多线程 C++ 应用程序。

To make a long story short: In order for it to run at maximum efficiency, I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active.

长话短说:为了让它以最高效率运行,我必须能够为每个物理处理器/内核实例化一个线程。创建比物理处理器/内核更多的线程会大大降低我的程序的性能。我已经可以在所有这三个平台上正确检测到逻辑处理器/内核的数量。为了能够正确检测物理处理器/内核的数量,我必须检测是否支持和激活超线程。

My question therefore is if there is a way to detect whether Hyper Threading is supported and enabled? If so, how exactly.

因此,我的问题是是否有办法检测是否支持和启用超线程?如果是这样,具体如何。

回答by jcoffland

EDIT: This is no longer 100% correct due to Intel's ongoing befuddlement.

编辑:由于英特尔的持续困惑,这不再是 100% 正确的。

The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.

我理解这个问题的方式是,您问的是如何检测 CPU 内核数与 CPU 线程数,这与检测系统中的逻辑和物理内核数不同。CPU 内核通常不被操作系统视为物理内核,除非它们有自己的封装或芯片。因此,操作系统将报告 Core 2 Duo,例如,具有 1 个物理 CPU 和 2 个逻辑 CPU,而具有超线程的 Intel P4 将以完全相同的方式报告,即使 2 个超线程与 2 个 CPU 内核非常相似不同的事情表现明智。

I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.

我一直在努力解决这个问题,直到我拼凑出下面的解决方案,我相信它适用于 AMD 和 Intel 处理器。据我所知,我可能是错的,AMD 还没有 CPU 线程,但他们提供了一种检测它们的方法,我认为这些方法将适用于未来可能具有 CPU 线程的 AMD 处理器。

In short here are the steps using the CPUID instruction:

简而言之,这里是使用 CPUID 指令的步骤:

  1. Detect CPU vendor using CPUID function 0
  2. Check for HTT bit 28 in CPU features EDX from CPUID function 1
  3. Get the logical core count from EBX[23:16] from CPUID function 1
  4. Get actual non-threaded CPU core count
    1. If vendor == 'GenuineIntel' this is 1 plus EAX[31:26] from CPUID function 4
    2. If vendor == 'AuthenticAMD' this is 1 plus ECX[7:0] from CPUID function 0x80000008
  1. 使用 CPUID 函数 0 检测 CPU 供应商
  2. 从 CPUID 功能 1 检查 CPU 功能 EDX 中的 HTT 位 28
  3. 从 CPUID 函数 1 的 EBX[23:16] 获取逻辑核心数
  4. 获取实际的非线程 CPU 核心数
    1. 如果 vendor == 'GenuineIntel' 这是 1 加上来自 CPUID 函数 4 的 EAX[31:26]
    2. 如果供应商 == 'AuthenticAMD' 这是 1 加上来自 CPUID 函数 0x80000008 的 ECX[7:0]

Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:

听起来很困难,但这里有一个希望能够做到这一点的独立于平台的 C++ 程序:

#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.

我还没有在 Windows 或 OSX 上实际测试过这个,但它应该可以工作,因为 CPUID 指令在 i686 机器上有效。显然,这不适用于 PowerPC,但它们也没有超线程。

Here is the output on a few different Intel machines:

以下是几台不同的 Intel 机器上的输出:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

 logical cpus: 2
    cpu cores: 2
hyper-threads: false

Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:

Intel(R) Core(TM)2 四核 CPU Q8400 @ 2.66GHz:

 logical cpus: 4
    cpu cores: 4
hyper-threads: false

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (w/ x2 physical CPU packages):

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz(带 x2 物理 CPU 封装):

 logical cpus: 16
    cpu cores: 8
hyper-threads: true

Intel(R) Pentium(R) 4 CPU 3.00GHz:

英特尔(R) 奔腾(R) 4 CPU 3.00GHz:

 logical cpus: 2
    cpu cores: 1
hyper-threads: true

回答by math

Note this, does not give the number of physically cores as intended, but logical cores.

请注意这一点,并没有给出预期的物理内核数,而是逻辑内核数。

If you can use C++11 (thanks to alfC's comment beneath):

如果您可以使用 C++11(感谢下面 alfC 的评论):

#include <iostream>
#include <thread>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
    return 0;
}

Otherwise maybe the Boost library is an option for you. Same code but different include as above. Include <boost/thread.hpp>instead of <thread>.

否则,也许 Boost 库是您的一个选择。与上面相同的代码但不同的包含。包括<boost/thread.hpp>而不是<thread>.

回答by rados

Windows only solution desribed here:

此处描述的仅限 Windows 的解决方案:

GetLogicalProcessorInformation

GetLogicalProcessorInformation

for linux, /proc/cpuinfo file. I am not running linux now so can't give you more detail. You can count physical/logical processor instances. If logical count is twice as physical, then you have HT enabled (true only for x86).

对于 linux,/proc/cpuinfo 文件。我现在没有运行 linux,所以不能给你更多的细节。您可以计算物理/逻辑处理器实例。如果逻辑计数是物理计数的两倍,那么您启用了 HT(仅适用于 x86)。

回答by Z boson

The current highest voted answer using CPUID appears to be obsolete. It reports both the wrong number of logical and physical processors. This appears to be confirmed from this answer cpuid-on-intel-i7-processors.

当前使用 CPUID 的最高投票答案似乎已过时。它报告了错误的逻辑和物理处理器数量。这似乎从这个答案cpuid-on-intel-i7-processors得到证实。

Specifically, using CPUID.1.EBX[23:16] to get the logical processors or CPUID.4.EAX[31:26]+1 to get the physical ones with Intel processors does not give the correct result on any Intel processor I have.

具体来说,使用 CPUID.1.EBX[23:16] 来获取逻辑处理器或 CPUID.4.EAX[31:26]+1 来获取带有英特尔处理器的物理处理器在任何英特尔处理器上都不能给出正确的结果我有。

For Intel CPUID.Bh should be used Intel_thread/Fcore and cache topology. The solution does not appear to be trivial. For AMD a different solution is necessary.

对于 Intel CPUID.Bh 应使用Intel_thread/Fcore 和缓存拓扑。解决方案似乎并不简单。对于 AMD,需要不同的解决方案。

Here is source code by by Intel which reports the correct number of physical and logical cores as well as the correct number of sockets https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. I tested this on a 80 logical core, 40 physical core, 4 socket Intel system.

这是英特尔的源代码,它报告了正确数量的物理和逻辑内核以及正确的套接字数量https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology -枚举/。我在 80 个逻辑核心、40 个物理核心、4 插槽 Intel 系统上对此进行了测试。

Here is source code for AMD http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/. It gave the correct result on my single socket Intel system but not on my four socket system. I don't have a AMD system to test.

这是 AMD 的源代码http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/。它在我的单插槽 Intel 系统上给出了正确的结果,但在我的四插槽系统上却没有。我没有要测试的 AMD 系统。

I have not dissected the source code yet to find a simple answer (if one exists) with CPUID. It seems that if the solution can change (as it seems to have) that the best solution is to use a library or OS call.

我还没有剖析源代码以找到带有 CPUID 的简单答案(如果存在的话)。似乎如果解决方案可以改变(就像它看起来那样),那么最好的解决方案是使用库或操作系统调用。

Edit:

编辑:

Here is a solution for Intel processors with CPUID leaf 11 (Bh). The way to do this is loop over the logical processors and get the x2APIC ID for each logical processor from CPUID and count the number of x2APIC IDs were the least significant bit is zero. For systems without hyper-threading the x2APIC ID will always be even. For systems with hyper-threading each x2APIC ID will have an even and odd version.

这是具有 CPUID 叶 11 (Bh) 的 Intel 处理器的解决方案。这样做的方法是循环逻辑处理器并从 CPUID 获取每个逻辑处理器的 x2APIC ID,并计算 x2APIC ID 的数量,最低有效位为零。对于没有超线程的系统,x2APIC ID 将始终是偶数。对于具有超线程的系统,每个 x2APIC ID 将具有偶数和奇数版本。

// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

The threads must be bound for this to work. OpenMP by default does not bind threads. Setting export OMP_PROC_BIND=truewill bind them or they can be bound in code as shown at thread-affinity-with-windows-msvc-and-openmp.

必须绑定线程才能使其工作。默认情况下,OpenMP 不绑定线程。设置export OMP_PROC_BIND=true将绑定它们,或者它们可以在代码中绑定,如thread-affinity-with-windows-msvc-and-openmp 所示

I tested this on my 4 core/8 HT system and it returned 4 with and without hyper-threading disabled in the BIOS. I also tested in on a 4 socket system with each socket having 10 cores / 20 HT and it returned 40 cores.

我在我的 4 核/8 HT 系统上对此进行了测试,它返回 4,无论是否在 BIOS 中禁用了超线程。我还在一个 4 插槽系统上进行了测试,每个插槽有 10 个内核/20 个 HT,它返回了 40 个内核。

AMD processors or older Intel processors without CPUID leaf 11 have to do something different.

没有 CPUID 叶 11 的 AMD 处理器或较旧的 Intel 处理器必须做一些不同的事情。

回答by Pnelego

From gathering ideas and concepts from some of the above ideas, I have come up with this solution. Please critique.

通过从上述一些想法中收集想法和概念,我提出了这个解决方案。请批评。

//EDIT INCLUDES

#ifdef _WIN32
    #include <windows.h>
#elif MACOS
    #include <sys/param.h>
    #include <sys/sysctl.h>
#else
    #include <unistd.h>
#endif

For almost every OS, the standard "Get core count" feature returns the logical core count. But in order to get the physical core count, we must first detect if the CPU has hyper threading or not.

对于几乎每个操作系统,标准的“获取核心数”功能都会返回逻辑核心数。但是为了得到物理核心数,我们首先要检测CPU是否有超线程。

uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );

logicalcpucount = systeminfo.dwNumberOfProcessors;

#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif

We now have the logical core count, now in order to get the intended results, we first must check if hyper threading is being used or if it's even available.

我们现在有了逻辑核心数,现在为了获得预期的结果,我们首先必须检查是否正在使用超线程或者它是否可用。

__asm__ __volatile__ ("cpuid " :
                      "=a" (registers[0]),
                      "=b" (registers[1]),
                      "=c" (registers[2]),
                      "=d" (registers[3])
                      : "a" (1), "c" (0));

unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);

Because there is not an Intel CPU with hyper threading that will only hyper thread one core (at least not from what I have read). This allows us to find this is a really painless way. If hyper threading is available,the logical processors will be exactly double the physical processors. Otherwise, the operating system will detect a logical processor for every single core. Meaning the logical and the physical core count will be identical.

因为没有一个带有超线程的 Intel CPU,它只会超线程一个内核(至少不是我读过的)。这让我们发现这是一种真正无痛的方式。如果超线程可用,逻辑处理器将是物理处理器的两倍。否则,操作系统将为每个内核检测一个逻辑处理器。这意味着逻辑和物理核心数将相同。

if (hyperthreading){
    physicalcpucount = logicalcpucount / 2;
} else {
    physicalcpucount = logicalcpucount;
}

fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);

回答by Harry

To follow on from math's answer, as of boost 1.56 there exists the physical_concurrency attribute which does exactly what you want.

按照数学的回答,从 boost 1.56 开始,存在 physical_concurrency 属性,它完全符合您的要求。

From the documentation - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency

从文档 - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency

The number of physical cores available on the current system. In contrast to hardware_concurrency() it does not return the number of virtual cores, but it counts only physical cores.

当前系统上可用的物理内核数。与 hardware_concurrency() 不同,它不返回虚拟内核的数量,但它只计算物理内核。

So an example would be

所以一个例子是

    #include <iostream>
    #include <boost/thread.hpp>

    int main()
    {
        std::cout << boost::thread::physical_concurrency();
        return 0;
    }

回答by pneveu

I know this is an old thread, but no one mentioned hwloc. The hwloc library is available on most Linux distributions and can also be compiled on Windows. The following code will return the number of physical processors. 4 in the case of a i7 CPU.

我知道这是一个旧线程,但没有人提到hwloc。hwloc 库在大多数 Linux 发行版上可用,也可以在 Windows 上编译。以下代码将返回物理处理器的数量。4 在 i7 CPU 的情况下。

#include <hwloc.h>

int nPhysicalProcessorCount = 0;

hwloc_topology_t sTopology;

if (hwloc_topology_init(&sTopology) == 0 &&
    hwloc_topology_load(sTopology) == 0)
{
    nPhysicalProcessorCount =
        hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);

    hwloc_topology_destroy(sTopology);
}

if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
    nPhysicalProcessorCount = omp_get_num_procs();
#else
    nPhysicalProcessorCount = 1;
#endif
}

回答by A Fog

It is not sufficient to test if an Intel CPU has hyperthreading, you also need to test if hyperthreading is enabled or disabled. There is no documented way to check this. An Intel guy came up with this trick to check if hyperthreading is enabled: Check the number of programmable performance counters using CPUID[0xa].eax[15:8] and assume that if the value is 8, HT is disabled, and if the value is 4, HT is enabled (https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551).

仅测试 Intel CPU 是否具有超线程是不够的,您还需要测试是否启用或禁用了超线程。没有记录的方法来检查这一点。一个 Intel 的人想出了这个技巧来检查是否启用了超线程:检查可编程性能计数器的数量使用 CPUID[0xa].eax[15:8] 并假设如果值为 8,则 HT 被禁用,如果值为 4,启用 HT(https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551)。

There is no problem on AMD chips: The CPUID reports 1 or 2 threads per core depending on whether simultaneous multithreading is disabled or enabled.

在 AMD 芯片上没有问题:CPUID 报告每个内核 1 或 2 个线程,具体取决于同步多线程是禁用还是启用。

You also have to compare the thread count from the CPUID with the thread count reported by the operating system to see if there are multiple CPU chips.

您还必须将来自 CPUID 的线程数与操作系统报告的线程数进行比较,以查看是否有多个 CPU 芯片。

I have made a function that implements all of this. It reports both the number of physical processors and the number of logical processors. I have tested it on Intel and AMD processors in Windows and Linux. It should work on Mac as well. I have published this code at https://github.com/vectorclass/add-on/tree/master/physical_processors

我制作了一个实现所有这些功能的函数。它报告物理处理器的数量和逻辑处理器的数量。我已经在 Windows 和 Linux 的 Intel 和 AMD 处理器上对其进行了测试。它也应该适用于 Mac。我已在https://github.com/vectorclass/add-on/tree/master/physical_processors 上发布了此代码

回答by Variable Length Coder

On OS X, you can read these values from sysctl(3)(the C API, or the command line utility of the same name). The man page should give you usage information. The following keys may be of interest:

在 OS X 上,您可以从sysctl(3)(C API 或同名命令行实用程序)读取这些值。手册页应该为您提供使用信息。以下键可能是有趣的:

$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12  <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24   <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2      <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24

回答by chahuistle

OpenMP should do the trick:

OpenMP 应该可以解决问题:

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

most compilers support OpenMP. If you are using a gcc-based compiler (*nix, MacOS), you need to compile using:

大多数编译器都支持 OpenMP。如果您使用的是基于 gcc 的编译器(*nix、MacOS),则需要使用以下命令进行编译:

$ g++ -fopenmp -o test.o test.cpp

(you might also need to tell your compiler to use the stdc++ library):

(您可能还需要告诉编译器使用 stdc++ 库):

$ g++ -fopenmp -o test.o -lstdc++ test.cpp

As far as I know OpenMP was designed to solve this kind of problems.

据我所知,OpenMP 旨在解决此类问题。