Linux找出超线程核心ID
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7274585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux find out Hyper-threaded core id
提问by Patrick
I spent this morning trying to find out how to determine which processor id is the hyper-threaded core, but without luck.
我今天早上试图找出如何确定哪个处理器 ID 是超线程核心,但没有运气。
I wish to find out this information and use set_affinity()
to bind a process to hyper-threaded thread or non-hyper-threaded thread to profile its performance.
我希望找出此信息并用于set_affinity()
将进程绑定到超线程线程或非超线程线程以分析其性能。
采纳答案by Patrick
I discovered the simply trick to do what I need.
我发现了做我需要做的事情的简单技巧。
cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
If the first number is equal to the CPU number (0 in this example) then it's a real core, if not it is a hyperthreading core.
如果第一个数字等于 CPU 编号(在本例中为 0),那么它就是一个真正的核心,否则就是一个超线程核心。
Real core example:
真正的核心示例:
# cat /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
1,13
Hyperthreading core example
超线程核心示例
# cat /sys/devices/system/cpu/cpu13/topology/thread_siblings_list
1,13
The output of the second example is exactly the same as the first one. However we are checking cpu13
, and the first number is 1
, so CPU 13 this is an hyperthreading core.
第二个示例的输出与第一个示例完全相同。但是我们正在检查cpu13
,第一个数字是1
,因此 CPU 13 这是一个超线程核心。
回答by osgx
HT is symmetric (in terms of basic resources, the system-mode may be asymmetric).
HT 是对称的(就基础资源而言,系统模式可能是非对称的)。
So, if the HT is turned on, large resources of Physical core will be shared between two threads. Some additional hardware is turned on to save state of both threads. Both threads have symmetric access to physical core.
因此,如果开启 HT,则物理内核的大量资源将在两个线程之间共享。一些额外的硬件被打开以保存两个线程的状态。两个线程都可以对称访问物理内核。
There is a difference between HT-disabled core and HT-enabled core; but no difference between 1st half of HT-enabled core and 2nd half of HT-enabled core.
HT-disabled core 和 HT-enabled core 是有区别的;但启用 HT 的内核的第一半和启用 HT 的内核的第二半之间没有区别。
At single moment of time, one HT-thread may use more resources than other, but this resource balancing is dynamic. CPU will balance threads as it can and as it wants if both threads want to use the same resource. You can only do a rep nop
or pause
in one thread to let CPU give more resources to other thread.
在单个时刻,一个 HT 线程可能比其他线程使用更多的资源,但这种资源平衡是动态的。如果两个线程都想使用相同的资源,CPU 将尽可能平衡线程。你只能在一个线程中做一个rep nop
或pause
,让 CPU 给其他线程更多的资源。
I wish to find out this information and use set_affinity() to bind a process to hyper-threaded thread or non-hyper-threaded thread to profile its performance.
我希望找出这些信息并使用 set_affinity() 将进程绑定到超线程线程或非超线程线程以分析其性能。
Okay, you actually can measure performance without knowing a fact. Just do a profile when the only thread in system is binded to CPU0; and repeat it when it is binded to CPU1. I think, the results will be almost the same (OS can generate noise if it binds some interrupts to CPU0; so try to lower number of interrupts when do testing and try to use CPU2 and CPU3 if you have such).
好的,您实际上可以在不知道事实的情况下衡量性能。当系统中唯一的线程绑定到 CPU0 时,只需做一个配置文件;并在绑定到CPU1时重复此操作。我认为,结果几乎相同(如果操作系统将某些中断绑定到 CPU0 会产生噪音;所以在测试时尽量减少中断次数,如果有的话,尽量使用 CPU2 和 CPU3)。
PS
聚苯乙烯
Agner (he is the Guru in x86) recommends to use even-numbered coresin the case when you want not to use HT, but it is enabled in BIOS:
Agner(他是 x86 中的大师)建议在不想使用 HT 的情况下使用偶数内核,但它在 BIOS 中启用:
If hyperthreading is detected then lock the process to use the even-numbered logical processors only. This will make one of the two threads in each processor core idle so that there is no contention for resources.
如果检测到超线程,则锁定进程以仅使用偶数逻辑处理器。这将使每个处理器内核中的两个线程之一空闲,从而不会争用资源。
PPS About New-reincarnation HT (not a P4 one, but Nehalem and Sandy) - based on Agner's research on microarchitecture
PPS 关于 New-reincarnation HT(不是 P4 的,而是 Nehalem 和 Sandy)——基于 Agner 对微架构的研究
The new bottlenecks that require attention in the Sandy Bridge are the following: ... 5. Sharing of resources between threads. Many of the critical resources are shared between the two threads of a core when hyperthreading is on. It may be wise to turn off hyperthreading when multiple threads depend on the same execution resources.
Sandy Bridge 中需要注意的新瓶颈如下: ... 5. 线程之间的资源共享。当超线程开启时,许多关键资源在核心的两个线程之间共享。当多个线程依赖于相同的执行资源时,关闭超线程可能是明智的。
...
...
A half-way solution was introduced in the NetBurst and again in the Nehalem and Sandy Bridge with the so-called hyperthreading technology. The hyperthreading processor has two logical processors sharing the same execution core. The advantage of this is limited if the two threads compete for the same resources, but hyperthreading can be quite advantageous if the performance is limited by something else, such as memory access.
在 NetBurst 和 Nehalem 和 Sandy Bridge 中采用了所谓的超线程技术引入了中途解决方案。超线程处理器具有共享相同执行核心的两个逻辑处理器。如果两个线程竞争相同的资源,这样做的优势是有限的,但如果性能受到其他因素(例如内存访问)的限制,则超线程可能会非常有利。
...
...
Both Intel and AMD are making hybrid solutions where some or all of the execution units are shared between two processor cores (hyperthreading in Intel terminology).
英特尔和 AMD 都在开发混合解决方案,其中部分或全部执行单元在两个处理器内核之间共享(英特尔术语中的超线程)。
PPPS: Intel Optimization book lists resource sharing in second-generation HT:(page 93, this list is for nehalem, but there is no changes of this list in Sandy section)
PPPS:Intel Optimization book 列出了第二代HT中的资源共享:(第93页,这个列表是给nehalem的,但是Sandy部分这个列表没有变化)
Deeper buffering and enhanced resource sharing/partition policies:
更深的缓冲和增强的资源共享/分区策略:
- — Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB //comment by me: there are 2 sets of this HW
- — Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors. // comment by me: there is single set of this HW; it is statically splitted between two HT-virtual cores in two halfs
- — Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB. // comment: Single set, but divided not in half. CPU will dynamically redivide resources.
- — Alternating during HT operation: front-end operation generally alternates between two logical processors to ensure fairness. // comment: there is single Frontend (instruction decoder), so threads will be decoded in order: 1, 2, 1, 2.
- — HT unaware resources: execution units. // comment: there are actual hw devices which will do computations, memory accesses. There is only single set. If one of threads is capable of using a lot of execution units and if it has a low number of memory waits, it will consume all exec units and second thread performance will be low (but HT will switch sometimes to second thread. How often??? ). If both threads are not heavy-optimized and/or have memory waits, execution units will be splitted between two threads.
- — HT 操作的复制资源:寄存器状态、重命名的返回堆栈缓冲区、大页面 ITLB //我的评论:这个硬件有 2 套
- — 用于 HT 操作的分区资源:加载缓冲区、存储缓冲区、重新排序缓冲区、小页面 ITLB 在两个逻辑处理器之间静态分配。// 我的评论:这个硬件只有一套;它在两个 HT 虚拟内核之间静态拆分为两半
- — HT 操作期间竞争性共享的资源:保留站、缓存层次结构、填充缓冲区、DTLB0 和 STLB。// 注释:单组,但不分成两半。CPU 将动态重新分配资源。
- — HT 操作时交替:前端操作一般在两个逻辑处理器之间交替,以保证公平性。// 注释:有单个 Frontend(指令解码器),因此线程将按顺序解码:1, 2, 1, 2。
- — HT 未知资源:执行单元。// 注释:有实际的硬件设备可以进行计算、内存访问。只有单套。如果其中一个线程能够使用很多执行单元并且它的内存等待次数很少,它将消耗所有执行单元并且第二个线程性能会很低(但 HT 有时会切换到第二个线程。多久一次? ??)。如果两个线程都没有进行大量优化和/或有内存等待,则执行单元将在两个线程之间拆分。
There are also pictures at page 112 (Figure 2-13), which shows that both logical cores are symmetric.
还有112页的图片(图2-13),可以看出两个逻辑核心都是对称的。
The performance potential due to HT Technology is due to:
HT 技术的性能潜力归因于:
- ? The fact that operating systems and user programs can schedule processes or threads to execute simultaneously on the logical processors in each physical processor
- ? The ability to use on-chip execution resources at a higher level than when only a single thread is consuming the execution resources; higher level of resource utilization can lead to higher system throughput
- ? 操作系统和用户程序可以调度进程或线程在每个物理处理器中的逻辑处理器上同时执行的事实
- ? 与仅单个线程消耗执行资源相比,能够在更高级别使用片上执行资源;更高的资源利用率可以导致更高的系统吞吐量
Although instructions originating from two programs or two threads execute simultaneously and not necessarily in program order in the execution core and memory hierarchy, the front end and back end contain several selection points to select between instructions from the two logical processors. All selection points alternate between the two logical processors unless one logical processor cannot make use of a pipeline stage. In this case, the other logical processor has full use of every cycle of the pipeline stage. Reasons why a logical processor may not use a pipeline stage include cache misses, branch mispredictions, and instruction dependencies.
尽管源自两个程序或两个线程的指令在执行核心和存储器层次结构中同时执行且不一定按程序顺序执行,但前端和后端包含多个选择点以在来自两个逻辑处理器的指令之间进行选择。除非一个逻辑处理器不能使用流水线级,否则所有选择点在两个逻辑处理器之间交替。在这种情况下,另一个逻辑处理器充分利用了流水线阶段的每个周期。逻辑处理器可能不使用流水线级的原因包括缓存未命中、分支错误预测和指令依赖性。
回答by Tomas Kubes
I tried to verify the information by comparing the temperature of the core and load on the HT core.
我试图通过比较核心的温度和 HT 核心上的负载来验证信息。
回答by osgx
There is universal (Linux/Windows) and portable HW topology detector (cores, HT, cacahes, south bridges and disk/net connection locality) - hwloc
by OpenMPI project. You may use it, because linux may use different HT core numbering rules, and we can't know will it be even/odd or y and y+8 nubering rule.
hwloc
OpenMPI 项目提供通用(Linux/Windows)和便携式硬件拓扑检测器(核心、HT、缓存、南桥和磁盘/网络连接位置)。您可以使用它,因为linux可能使用不同的HT核心编号规则,我们无法知道它是偶数/奇数还是y和y+8编号规则。
Home page of hwloc: http://www.open-mpi.org/projects/hwloc/
hwloc 主页:http: //www.open-mpi.org/projects/hwloc/
Download page: http://www.open-mpi.org/software/hwloc/v1.10/
下载页面:http: //www.open-mpi.org/software/hwloc/v1.10/
Description:
描述:
The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently.
便携式硬件局部性 (hwloc) 软件包提供现代体系结构分层拓扑的可移植抽象(跨操作系统、版本、体系结构等),包括 NUMA 内存节点、套接字、共享缓存、核心和同步多线程。它还收集各种系统属性,例如缓存和内存信息以及 I/O 设备的位置,例如网络接口、InfiniBand HCA 或 GPU。它主要旨在帮助应用程序收集有关现代计算硬件的信息,以便相应地有效地利用它。
It has lstopo
command to get hw topology in graphic form like
它具有lstopo
以图形形式获取硬件拓扑的命令,例如
ubuntu$ sudo apt-get hwloc
ubuntu$ lstopo
or in text form:
或以文本形式:
ubuntu$ sudo apt-get hwloc-nox
ubuntu$ lstopo --of console
We can see physical cores as Core L#x
each having two logical cores PU L#y
and PU L#y+8
.
我们可以将物理内核视为Core L#x
每个内核都有两个逻辑内核PU L#y
和PU L#y+8
。
Machine (16GB)
Socket L#0 + L3 L#0 (4096KB)
L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
PU L#2 (P#4)
PU L#3 (P#12)
Socket L#1 + L3 L#1 (4096KB)
L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
PU L#4 (P#1)
PU L#5 (P#9)
L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
PU L#6 (P#5)
PU L#7 (P#13)
Socket L#2 + L3 L#2 (4096KB)
L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
PU L#8 (P#2)
PU L#9 (P#10)
L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
PU L#10 (P#6)
PU L#11 (P#14)
Socket L#3 + L3 L#3 (4096KB)
L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
PU L#12 (P#3)
PU L#13 (P#11)
L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
回答by Connor Doyle
I'm surprised nobody has mentioned lscpu
yet. Here's an example on a single-socket system with four physical cores and hyper-threading enabled:
我很惊讶还没有人提到lscpu
。以下是具有四个物理内核和超线程的单插槽系统示例:
$ lscpu -p
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node,,L1d,L1i,L2,L3
0,0,0,0,,0,0,0,0
1,1,0,0,,1,1,1,0
2,2,0,0,,2,2,2,0
3,3,0,0,,3,3,3,0
4,0,0,0,,0,0,0,0
5,1,0,0,,1,1,1,0
6,2,0,0,,2,2,2,0
7,3,0,0,,3,3,3,0
The output explains how to interpret the table of IDs; logical CPU IDs with the same Core ID are siblings.
输出解释了如何解释 ID 表;具有相同核心 ID 的逻辑 CPU ID 是兄弟。
回答by Orsiris de Jong
Simple way to get hyperthreading siblings of cpu cores in bash:
在 bash 中获取 cpu 内核的超线程兄弟的简单方法:
cat $(find /sys/devices/system/cpu -regex ".*cpu[0-9]+/topology/thread_siblings_list") | sort -n | uniq