如何从 linux 调度程序中屏蔽 cpu(防止它将线程调度到该 cpu 上)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11111852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to shield a cpu from the linux scheduler (prevent it scheduling threads onto that cpu)?
提问by Steve Lorimer
It is possible to use sched_setaffinity
to pin a thread to a cpu, increasing performance (in some situations)
可以使用sched_setaffinity
将线程固定到 CPU,从而提高性能(在某些情况下)
From the linux man page:
从 linux 手册页:
Restricting a process to run on a single CPU also avoids the performance cost caused by the cache invalidation that occurs when a process ceases to execute on one CPU and then recommences execution on a different CPU
限制进程在单个 CPU 上运行还避免了当进程停止在一个 CPU 上执行然后重新在不同的 CPU 上执行时发生的缓存失效导致的性能成本
Further, if I desire a more real-time response, I can change the scheduler policy for that thread to SCHED_FIFO
, and up the priority to some high value (up to sched_get_priority_max
), meaning the thread in question should always pre-empt any other thread running on its cpu when it becomes ready.
此外,如果我想要更实时的响应,我可以将该线程的调度程序策略更改为SCHED_FIFO
,并将优先级提高到某个高值(最多sched_get_priority_max
),这意味着相关线程应始终抢占任何其他正在运行的线程当它准备好时,在它的 CPU 上。
However, at this point, the thread running on the cpu which the real-time thread just pre-empted will possibly have evicted much of the real-time thread's level-1 cache entries.
但是,此时,在实时线程刚刚抢占的 cpu 上运行的线程可能已经驱逐了实时线程的大部分 1 级缓存条目。
My questions are as follows:
我的问题如下:
- Is it possible to prevent the scheduler from scheduling any threads onto a given cpu? (eg: either hide the cpu completely from the scheduler, or some other way)
- Are there some threads which absolutely have to be able to run on that cpu? (eg: kernel threads / interrupt threads)
- If I need to have kernel threads running on that cpu, what is a reasonable maximum priority value to use such that I don't starve out the kernel threads?
- 是否可以阻止调度程序将任何线程调度到给定的 cpu 上?(例如:要么从调度程序中完全隐藏 CPU,要么以其他方式)
- 是否有一些线程绝对必须能够在该 CPU 上运行?(例如:内核线程/中断线程)
- 如果我需要在该 CPU 上运行内核线程,那么使用的合理最大优先级值是多少,这样我就不会耗尽内核线程?
采纳答案by Steve Lorimer
The answer is to use cpusets. The python cpuset utilitymakes it easy to configure them.
答案是使用cpusets。该蟒蛇cpuset工具可以很容易地配置它们。
Basic concepts
基本概念
3 cpusets
3 个 cpuset
root
: present in all configurations and contains all cpus (unshielded)system
: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)user
: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)
root
:存在于所有配置中并包含所有 CPU(未屏蔽)system
: 包含用于系统任务的 CPU - 需要运行但不“重要”的 CPU(未屏蔽)user
: 包含用于“重要”任务的 CPU - 我们想要在“实时”模式下运行的那些(屏蔽)
The shield
command manages these 3 cpusets.
该shield
命令管理这 3 个 cpuset。
During setup it moves all movable tasks into the unshielded cpuset (system
) and during teardown it moves all movable tasks into the root
cpuset.
After setup, the subcommand lets you move tasks into the shield(user
) cpuset, and additionally, to move special tasks (kernel threads) from root
to system
(and therefore out of the user
cpuset).
在设置期间,它将所有可移动任务移动到未屏蔽的 cpuset ( system
) 中,在拆卸期间,它将所有可移动任务移动到root
cpuset 中。设置后,该子命令允许您将任务移动到shield( user
) cpuset 中,此外,还可以将特殊任务(内核线程)从root
to 移动system
(并因此移出user
cpuset)。
Commands:
命令:
First we create a shield. Naturally the layout of the shield will be machine/task dependent. For example, say we have a 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters, and we leave the kernel threads running in the root
cpuset (ie: across all cpus)
首先我们创建一个盾牌。当然,盾牌的布局将取决于机器/任务。例如,假设我们有一台 4 核的非 NUMA 机器:我们希望将3 个内核专用于屏蔽,而保留1 个内核用于不重要的任务;因为它是非 NUMA 我们不需要指定任何内存节点参数,我们让内核线程在root
cpuset 中运行(即:跨所有 cpu)
$ cset shield --cpu 1-3
Some kernel threads (those which aren't bound to specific cpus) can be moved into the system
cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)
一些内核线程(那些未绑定到特定 CPU 的线程)可以移动到system
cpuset 中。(一般来说,移动绑定到特定 cpu 的内核线程不是一个好主意)
$ cset shield --kthread on
Now let's list what's running in the shield (user
) or unshielded (system
) cpusets: (-v
for verbose, which will list the process names) (add a 2nd -v
to display more than 80 characters)
现在让我们列出在屏蔽 ( user
) 或未屏蔽 ( system
) cpusets 中运行的内容:(-v
对于详细,将列出进程名称)(添加第二个-v
以显示超过 80 个字符)
$ cset shield --shield -v
$ cset shield --unshield -v -v
If we want to stop the shield (teardown)
如果我们要停止屏蔽(拆解)
$ cset shield --reset
Now let's execute a process in the shield (commands following '--'
are passed to the command to be executed, not to cset
)
现在让我们在屏蔽中执行一个进程(下面'--'
的命令传递给要执行的命令,而不是传递给cset
)
$ cset shield --exec mycommand -- -arg1 -arg2
If we already have a running process which we want to move into the shield (note we can move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))
如果我们已经有一个要移动到屏蔽中的正在运行的进程(请注意,我们可以通过传递逗号分隔的列表或范围来移动多个进程(范围内的任何进程都将被移动,即使存在间隙)
$ cset shield --shield --pid 1234
$ cset shield --shield --pid 1234,1236
$ cset shield --shield --pid 1234,1237,1238-1240
Advanced concepts
先进理念
cset set/proc
- these give you finer control of cpusets
cset set/proc
- 这些使您可以更好地控制 cpuset
Set
放
Create, adjust, rename, move and destroy cpusets
创建、调整、重命名、移动和销毁 cpuset
Commands
命令
Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"
创建一个cpuset,使用cpus 1-3,使用NUMA节点1并命名为“my_cpuset1”
$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1
Change "my_cpuset1" to only use cpus 1 and 3
将“my_cpuset1”更改为仅使用 cpu 1 和 3
$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1
Destroy a cpuset
销毁一个cpuset
$ cset set --destroy --set=my_cpuset1
Rename an existing cpuset
重命名现有的cpuset
$ cset set --set=my_cpuset1 --newname=your_cpuset1
Create a hierarchical cpuset
创建一个分层的cpuset
$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1
List existing cpusets (depth of level 1)
列出现有的cpusets(级别1的深度)
$ cset set --list
List existing cpuset and its children
列出现有的cpuset 和它的孩子
$ cset set --list --set=my_cpuset1
List all existing cpusets
列出所有现有的 cpuset
$ cset set --list --recurse
Proc
进程
Manage threads and processes
管理线程和进程
Commands
命令
List tasks running in a cpuset
列出在 cpuset 中运行的任务
$ cset proc --list --set=my_cpuset1 --verbose
Execute a task in a cpuset
在cpuset中执行任务
$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2
Moving a task
移动任务
$ cset proc --toset=my_cpuset1 --move --pid 1234
$ cset proc --toset=my_cpuset1 --move --pid 1234,1236
$ cset proc --toset=my_cpuset1 --move --pid 1238-1340
Moving a task and all its siblings
移动任务及其所有兄弟
$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads
Move all tasks from one cpuset to another
将所有任务从一个 cpuset 移动到另一个
$ cset proc --move --fromset=my_cpuset1 --toset=system
Move unpinned kernel threads into a cpuset
将未固定的内核线程移动到 cpuset 中
$ cset proc --kthread --fromset=root --toset=system
Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)
强行将内核线程(包括固定到特定 cpu 的线程)移动到 cpuset(注意:这可能对系统产生可怕的后果 - 确保您知道自己在做什么)
$ cset proc --kthread --fromset=root --toset=system --force
Hierarchy example
层次结构示例
We can use hierarchical cpusets to create prioritised groupings
我们可以使用分层 cpusets 来创建优先分组
- Create a
system
cpuset with 1 cpu (0) - Create a
prio_low
cpuset with 1 cpu (1) - Create a
prio_met
cpuset with 2 cpus (1-2) - Create a
prio_high
cpuset with 3 cpus (1-3) - Create a
prio_all
cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)
- 创建一个
system
具有 1 个 cpu (0)的cpuset prio_low
用 1 个 cpu创建一个cpuset (1)prio_met
用 2个cpu创建一个cpuset (1-2)prio_high
用 3个cpu创建一个cpuset (1-3)- 创建一个
prio_all
包含所有 4个cpu (0-3)的cpuset(注意这与 root 相同;与 root 保持分离被认为是一种好习惯)
To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc
为了实现上述目的,您创建 prio_all,然后在 prio_all 下创建子集 prio_high,等等
$ cset set --cpu=0 --set=system
$ cset set --cpu=0-3 --set=prio_all
$ cset set --cpu=1-3 --set=/prio_all/prio_high
$ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med
$ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low
回答by Leo
There are two other ways I can think of doing this (though not as elegant as cset, which doesn't seem to have a fantastic level of support from Redhat):
我可以想到另外两种方法(虽然不像 cset 那样优雅,它似乎没有来自 Redhat 的出色支持):
1) Taskset everything including PID 1 - nice and easy (but, alledgly -- I've never seen any issues myself -- may cause inefficiencies in the scheduler). The script below (which must be run as root) runs taskset on all the running processes, including init (pid 1); this will pin all the running processes to one or more 'junk cores', and by also pinning init, it will ensure that any future processes are also started in the list of 'junk cores':
1) 任务集包括 PID 1 在内的所有内容 - 很好很容易(但是,据说 - 我自己从未见过任何问题 - 可能会导致调度程序效率低下)。下面的脚本(必须以 root 身份运行)在所有正在运行的进程上运行 taskset,包括 init (pid 1);这会将所有正在运行的进程固定到一个或多个“垃圾核心”,并且通过固定 init,它将确保任何未来的进程也在“垃圾核心”列表中启动:
#!/bin/bash
if [[ -z ]]; then
printf "Usage: %s '<csv list of cores to set as junk in double quotes>'", isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
Format:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
algorithms. You can move a process onto or off an
"isolated" CPU via the CPU affinity syscalls or cpuset.
<cpu number> begins at 0 and the maximum value is
"number of CPUs in system - 1".
This option is the preferred way to isolate CPUs. The
alternative -- manually setting the CPU mask of all
tasks in the system -- can cause problems and
suboptimal load balancer performance.
exit -1;
fi
for i in `ps -eLfad |awk '{ print } '|grep -v PID | xargs echo `; do
taskset -pc $i;
done
2) use the isolcpus kernel parameter (here's the documentation from https://www.kernel.org/doc/Documentation/kernel-parameters.txt):
2) 使用 isolcpus 内核参数(这里是来自https://www.kernel.org/doc/Documentation/kernel-parameters.txt的文档):
sudo su -
cgcreate -g cpuset:not_cpu_3
echo 0-2 > /sys/fs/cgroup/cpuset/not_cpu_3/cpuset.cpus
# This "0" is the memory node. See https://utcc.utoronto.ca/~cks/space/blog/linux/NUMAMemoryInfo
# for more information *
echo 0 > /sys/fs/cgroup/cpuset/not_cpu_3/cpuset.mems
I've used these two plus the cset mechanisms for several projects (incidentally, please pardon the blatant self promotion :-)), I've just filed a patent for a tool called Pontus Vision ThreadManagerthat comes up with optimal pinning strategies for any given x86 platform to any given software work loads; after testing it in a customer site, I got really good results (270% reduction in peak latencies), so it's well worth doing pinning and CPU isolation.
我已经在几个项目中使用了这两个加上 cset 机制(顺便说一句,请原谅公然的自我推销:-)),我刚刚为一个名为Pontus Vision ThreadManager的工具申请了专利,该工具 为任何项目提供了最佳固定策略为任何给定的软件工作负载提供 x86 平台;在客户站点对其进行测试后,我得到了非常好的结果(峰值延迟减少了 270%),因此非常值得进行固定和 CPU 隔离。
回答by Mike S
Here's how to do it the old-fashioned way, using cgroups. I have a Fedora 28 machine and RedHat/Fedora want you to use systemd-run
, but I wasn't able to find this functionality in there. I would love to know how to do it using systemd-run
, if anyone would care to enlighten me.
以下是使用 cgroup 以老式方式完成此操作的方法。我有一台 Fedora 28 机器,RedHat/Fedora 想让你使用systemd-run
,但我在那里找不到这个功能。systemd-run
如果有人愿意启发我,我很想知道如何使用它。
Let's say I want to exclude my fourth CPU (of CPUs 0-3) from scheduling, and move all existing processes to CPUs 0-2. Then I want to put a process on CPU 3 all by itself.
假设我想从调度中排除我的第四个 CPU(CPU 0-3),并将所有现有进程移动到 CPU 0-2。然后我想在 CPU 3 上单独放置一个进程。
cgcreate -g cpuset:cpu_3
echo 3 > /sys/fs/cgroup/cpuset/cpu_3/cpuset.cpus
# Again, the memory node(s) you want to specify.
echo 0 > /sys/fs/cgroup/cpuset/cpu_3/cpuset.mems
- Specifically, on your machine you'll want to review
/proc/zoneinfo
and the/sys/devices/system/node
heirarchy. Getting the proper node information is left as an exercise for the reader.
- 具体来说,您的机器上你要检讨
/proc/zoneinfo
和/sys/devices/system/node
层次结构。获取正确的节点信息留给读者作为练习。
Now that we have our cgroup, we need to create our isolated CPU 3 cgroup:
现在我们有了 cgroup,我们需要创建独立的 CPU 3 cgroup:
for pid in $(ps -eLo pid) ; do cgclassify -g cpuset:not_cpu_3 $pid; done
Put all processes/threads on the not_cpu_3
cgroup:
将所有进程/线程放在not_cpu_3
cgroup 上:
ps -eL k psr o psr,pid,tid,args | sort | cut -c -80
Review:
:
kill -CONT <thread_id>
NOTE! Processes currently in sleep will not move. They must be awakened so that the scheduler will put them on a different CPU. To see this, choose your favorite sleeping process in the above list- a process, say a web browser, that you thought should be on CPU 0-2 but it's still on 3. Using its thread ID from the above list, perform:
笔记!当前处于睡眠状态的进程不会移动。它们必须被唤醒,以便调度程序将它们放在不同的 CPU 上。要查看这一点,请在上面的列表中选择您最喜欢的睡眠进程 - 一个进程,比如一个 Web 浏览器,您认为它应该在 CPU 0-2 上,但它仍然在 3 上。使用上面列表中的线程 ID,执行:
kill -CONT 9812
example
例子
pid=12566 # for example
cgclassify -g cpuset:cpu_3 $pid
taskset -c -p 3 $pid
Rerun the ps command, and note that it's moved to another CPU.
重新运行 ps 命令,并注意它已移至另一个 CPU。
DOUBLE NOTE! Some kernel threads cannotand will not move! For example, you may note that every CPU has a kernel thread [kthreadd]
on it. Assigning processes to cgroups works for userspace processes, not for kernel threads. This is life in the multitasking world.
双重注意!一些内核线程不能也不会移动!例如,您可能注意到每个 CPU 上都有一个内核线程[kthreadd]
。将进程分配给 cgroups 适用于用户空间进程,而不适用于内核线程。这就是多任务世界中的生活。
Now to move a process and all its children to control group cpu_3:
现在将进程及其所有子进程移动到控制组 cpu_3:
cgdelete -r cpuset:cpu_3
cgdelete -r cpuset:not_cpu_3
Again, if $pid
is sleeping, you'll need to wake it up for the CPU move to actually take place.
同样,如果$pid
正在休眠,则需要将其唤醒才能实际发生 CPU 移动。
To undo all of this, simply delete the cgroups you've created. Everybody will be stuck back into the root cgroup:
要撤消所有这些,只需删除您创建的 cgroup。每个人都会被困在根 cgroup 中:
##代码##No need to reboot.
无需重新启动。
(Sorry, I don't understand the 3rd question from the original poster. I can't comment on that.)
(抱歉,我不明白原始海报中的第三个问题。我无法对此发表评论。)
回答by root
If you are using rhel instance you can use Tuna for this (May be available for other linux distros also, but not sure about that). It can easily installed from yum command. Tuna can be used to isolate a cpu core and it dynamically moves processes run in that particular cpu to neighboring cpu. The command to isolate a cpu core is as follow,
如果您使用的是 rhel 实例,则可以为此使用 Tuna(也可用于其他 linux 发行版,但不确定)。它可以通过 yum 命令轻松安装。Tuna 可用于隔离 CPU 内核,并动态地将在该特定 CPU 中运行的进程移动到相邻的 CPU。隔离cpu核心的命令如下,
# tuna --cpus=CPU-LIST --isolate
# tuna --cpus=CPU-LIST --isolate
You can use htop
to see how tuna isolate the cpu cores in real-time.
您可以使用htop
来查看金枪鱼如何实时隔离 cpu 内核。