Linux 如何从 C 程序中获得 100% 的 CPU 使用率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9244481/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get 100% CPU usage from a C program
提问by bag-man
This is quite an interesting question so let me set the scene. I work at The National Museum of Computing, and we have just managed to get a Cray Y-MP EL super computer from 1992 running, and we really want to see how fast it can go!
这是一个非常有趣的问题,所以让我设置场景。我在国家计算机博物馆工作,我们刚刚设法让一台 1992 年的 Cray Y-MP EL 超级计算机运行起来,我们真的很想看看它能跑多快!
We decided the best way to do this was to write a simple C program that would calculate prime numbers and show how long it took to do so, then run the program on a fast modern desktop PC and compare the results.
我们决定最好的方法是编写一个简单的 C 程序来计算素数并显示计算所需的时间,然后在快速的现代台式 PC 上运行该程序并比较结果。
We quickly came up with this code to count prime numbers:
我们很快就想出了这个代码来计算素数:
#include <stdio.h>
#include <time.h>
void main() {
clock_t start, end;
double runTime;
start = clock();
int i, num = 1, primes = 0;
while (num <= 1000) {
i = 2;
while (i <= num) {
if(num % i == 0)
break;
i++;
}
if (i == num)
primes++;
system("clear");
printf("%d prime numbers calculated\n",primes);
num++;
}
end = clock();
runTime = (end - start) / (double) CLOCKS_PER_SEC;
printf("This machine calculated all %d prime numbers under 1000 in %g seconds\n", primes, runTime);
}
Which on our dual core laptop running Ubuntu (The Cray runs UNICOS), worked perfectly, getting 100% CPU usage and taking about 10 minutes or so. When I got home I decided to try it on my hex-core modern gaming PC, and this is where we get our first issues.
在我们运行 Ubuntu(The Cray 运行 UNICOS)的双核笔记本电脑上,运行完美,CPU 使用率达到 100%,大约需要 10 分钟左右。回到家后,我决定在我的六核现代游戏 PC 上试用它,这就是我们遇到的第一个问题。
I first adapted the code to run on Windows since that is what the gaming PC was using, but was saddened to find that the process was only getting about 15% of the CPU's power. I figured that must be Windows being Windows, so I booted into a Live CD of Ubuntu thinking that Ubuntu would allow the process to run with its full potential as it had done earlier on my laptop.
我首先修改了代码以在 Windows 上运行,因为这是游戏 PC 所使用的,但很遗憾地发现该过程仅获得了 CPU 功率的 15% 左右。我想那一定是 Windows 是 Windows,所以我启动到 Ubuntu 的 Live CD,认为 Ubuntu 将允许该过程发挥其全部潜力,就像它之前在我的笔记本电脑上所做的那样。
However I only got 5% usage! So my question is, how can I adapt the program to run on my gaming machine in either Windows 7 or live Linux at 100% CPU utilisation? Another thing that would be great but not necessary is if the end product can be one .exe that could be easily distributed and ran on Windows machines.
但是我只有 5% 的使用率!所以我的问题是,我如何调整程序以 100% CPU 使用率在 Windows 7 或 live Linux 中运行我的游戏机?另一件很棒但不必要的事情是,最终产品是否可以是一个可以轻松分发并在 Windows 机器上运行的 .exe。
Thanks a lot!
非常感谢!
P.S. Of course this program didn't really work with the Crays 8 specialist processors, and that is a whole other issue... If you know anything about optimising code to work on 90's Cray super computers give us a shout too!
PS 当然,这个程序并没有真正与 Crays 8 专业处理器一起工作,这是另一个问题......如果你知道任何关于优化代码以在 90 年代的 Cray 超级计算机上工作的事情,也请给我们点赞!
采纳答案by Mysticial
If you want 100% CPU, you need to use more than 1 core. To do that, you need multiple threads.
如果要 100% CPU,则需要使用 1 个以上的内核。为此,您需要多个线程。
Here's a parallel version using OpenMP:
这是使用 OpenMP 的并行版本:
I had to increase the limit to 1000000
to make it take more than 1 second on my machine.
我不得不增加限制以1000000
使其在我的机器上花费超过 1 秒。
#include <stdio.h>
#include <time.h>
#include <omp.h>
int main() {
double start, end;
double runTime;
start = omp_get_wtime();
int num = 1,primes = 0;
int limit = 1000000;
#pragma omp parallel for schedule(dynamic) reduction(+ : primes)
for (num = 1; num <= limit; num++) {
int i = 2;
while(i <= num) {
if(num % i == 0)
break;
i++;
}
if(i == num)
primes++;
// printf("%d prime numbers calculated\n",primes);
}
end = omp_get_wtime();
runTime = end - start;
printf("This machine calculated all %d prime numbers under %d in %g seconds\n",primes,limit,runTime);
return 0;
}
Output:
输出:
This machine calculated all 78498 prime numbers under 1000000 in 29.753 seconds
这台机器在29.753秒内计算了1000000以下的所有78498个素数
Here's your 100% CPU:
这是您的 100% CPU:
回答by cha0site
You're running one process on a multi-core machine - so it only runs on one core.
您在多核机器上运行一个进程 - 所以它只在一个核上运行。
The solution is easy enough, since you're just trying to peg the processor - if you have N cores, run your program N times (in parallel, of course).
解决方案很简单,因为您只是想固定处理器 - 如果您有 N 个内核,请运行您的程序 N 次(当然是并行的)。
Example
例子
Here is some code that runs your program NUM_OF_CORES
times in parallel. It's POSIXy code - it uses fork
- so you should run that under Linux. If what I'm reading about the Cray is correct, it might be easier to port this code than the OpenMP code in the other answer.
这是一些NUM_OF_CORES
并行运行程序时间的代码。它是 POSIXy 代码 - 它使用fork
- 所以你应该在 Linux 下运行它。如果我读到的关于 Cray 的内容是正确的,那么移植此代码可能比另一个答案中的 OpenMP 代码更容易。
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#define NUM_OF_CORES 8
#define MAX_PRIME 100000
void do_primes()
{
unsigned long i, num, primes = 0;
for (num = 1; num <= MAX_PRIME; ++num) {
for (i = 2; (i <= num) && (num % i != 0); ++i);
if (i == num)
++primes;
}
printf("Calculated %d primes.\n", primes);
}
int main(int argc, char ** argv)
{
time_t start, end;
time_t run_time;
unsigned long i;
pid_t pids[NUM_OF_CORES];
/* start of test */
start = time(NULL);
for (i = 0; i < NUM_OF_CORES; ++i) {
if (!(pids[i] = fork())) {
do_primes();
exit(0);
}
if (pids[i] < 0) {
perror("Fork");
exit(1);
}
}
for (i = 0; i < NUM_OF_CORES; ++i) {
waitpid(pids[i], NULL, 0);
}
end = time(NULL);
run_time = (end - start);
printf("This machine calculated all prime numbers under %d %d times "
"in %d seconds\n", MAX_PRIME, NUM_OF_CORES, run_time);
return 0;
}
Output
输出
$ ./primes
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
Calculated 9592 primes.
This machine calculated all prime numbers under 100000 8 times in 8 seconds
回答by mikithskegg
Try to parallelize your program using, e.g., OpenMP. It is a very simple and effective framework for making up parallel programs.
尝试使用例如 OpenMP 来并行化您的程序。它是一个非常简单有效的构成并行程序的框架。
回答by Carl
The reason you're getting 15% on a hex core processor is because your code uses 1 core at 100%. 100/6 = 16.67%, which using a moving average with process scheduling (your process would be running under normal priority) could easily be reported as 15%.
您在十六进制核心处理器上获得 15% 的原因是因为您的代码以 100% 使用 1 个核心。100/6 = 16.67%,使用带有进程调度的移动平均值(您的进程将在正常优先级下运行)可以很容易地报告为 15%。
Therefore, in order to use 100% cpu, you would need to use all the cores of your CPU - launch 6 parallel execution code paths for a hex core CPU and have this scale right up to however many processors your Cray machine has :)
因此,为了使用 100% 的 cpu,您需要使用 CPU 的所有核心 - 为十六进制核心 CPU 启动 6 个并行执行代码路径,并将此比例调整为您的 Cray 机器拥有的处理器数量:)
回答by jfs
we really want to see how fast it can go!
我们真的很想看看它能跑多快!
Your algorithm to generate prime numbers is very inefficient. Compare it to primegenthat generates the 50847534 primes up to 1000000000 in just 8 seconds on a Pentium II-350.
您生成素数的算法非常低效。把它比作primegen产生的50847534个素数多达1000000000在短短8秒奔腾II-350。
To consume all CPUs easily you could solve an embarrassingly parallel probleme.g., compute Mandelbrot setor use genetic programming to paint Mona Lisain multiple threads (processes).
要轻松消耗所有 CPU,您可以解决一个令人尴尬的并行问题,例如,计算Mandelbrot 集或使用遗传编程在多个线程(进程)中绘制蒙娜丽莎。
Another approach is to take an existing benchmark program for the Cray supercomputer and port it to a modern PC.
另一种方法是采用 Cray 超级计算机的现有基准程序并将其移植到现代 PC。
回答by Joel
For a quick improvement on one core, remove system calls to reduce context-switching. Remove these lines:
为了快速改进一个核心,删除系统调用以减少上下文切换。删除这些行:
system("clear");
printf("%d prime numbers calculated\n",primes);
The first is particularly bad, as it will spawn a new process every iteration.
第一个特别糟糕,因为它每次迭代都会产生一个新进程。
回答by Steen Schmidt
Also be very aware howyou're loading the CPU. A CPU can do a lot of different tasks, and while many of them will be reported as "loading the CPU 100%" they may each use 100% of different parts of the CPU. In other words, it's very hard to compare two different CPUs for performance, and especially two different CPU architectures. Executing task A may favor one CPU over another, while executing task B it can easily be the other way around (since the two CPUs may have different resources internally and may execute code very differently).
还要非常清楚你是如何加载 CPU 的。CPU 可以执行许多不同的任务,虽然其中许多任务会被报告为“100% 加载 CPU”,但它们每个都可能使用 100% 的 CPU 不同部分。换句话说,很难比较两种不同的 CPU 的性能,尤其是两种不同的 CPU 架构。执行任务 A 可能比另一个 CPU 更偏向于一个 CPU,而执行任务 B 则很容易反过来(因为两个 CPU 内部可能具有不同的资源并且执行代码的方式可能非常不同)。
This is the reason software is just as important for making computers perform optimal as hardware is. This is indeed very true for "supercomputers" as well.
这就是软件对于使计算机执行最佳性能与硬件同样重要的原因。这对于“超级计算机”也确实如此。
One measure for CPU performance could be instructions per second, but then again instructions aren't created equal on different CPU architectures. Another measure could be cache IO performance, but cache infrastructure is not equal either. Then a measure could be number of instructions per watt used, as power delivery and dissipation is often a limiting factor when designing a cluster computer.
CPU 性能的一种衡量标准可能是每秒指令数,但同样,在不同的 CPU 架构上创建的指令并不相同。另一个衡量标准可能是缓存 IO 性能,但缓存基础设施也不相同。然后,衡量标准可能是每瓦使用的指令数,因为在设计集群计算机时,功率传输和耗散通常是一个限制因素。
So your first question should be: Which performance parameter is important to you? What do you want to measure? If you want to see which machine gets the most FPS out of Quake 4, the answer is easy; your gaming rig will, as the Cray can't run that program at all ;-)
所以你的第一个问题应该是:哪个性能参数对你来说很重要?你想测量什么?如果你想看看哪台机器从 Quake 4 中获得了最多的 FPS,答案很简单;您的游戏装备会,因为 Cray 根本无法运行该程序;-)
Cheers, Steen
干杯,斯汀
回答by sapy
TLDR; The accepted answer is both inefficient and incompatible. Following algo works 100xfaster.
TLDR;公认的答案既低效又不兼容。遵循算法的工作速度提高了100 倍。
The gcc compiler available on MAC can't run omp
. I had to install llvm (brew install llvm )
. But I didn't see CPU idle was going downwhile running OMP version.
MAC 上可用的 gcc 编译器无法运行omp
。我必须安装 llvm (brew install llvm )
。但是我没有看到运行 OMP 版本时CPU 空闲时间下降。
Here is a screenshot while OMP version was running.
Alternatively, I used the basic POSIX thread, that can be run using any c compiler and saw almost entire CPU used upwhen nos of thread
= no of cores
= 4 (MacBook Pro, 2.3 GHz Intel Core i5). Here is the programme -
或者,我使用了基本的 POSIX 线程,该线程可以使用任何 c 编译器运行,并且看到当nos of thread
= no of cores
= 4(MacBook Pro,2.3 GHz Intel Core i5)时几乎整个 CPU 都用完了。这是程序 -
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define NUM_THREADS 10
#define THREAD_LOAD 100000
using namespace std;
struct prime_range {
int min;
int max;
int total;
};
void* findPrime(void *threadarg)
{
int i, primes = 0;
struct prime_range *this_range;
this_range = (struct prime_range *) threadarg;
int minLimit = this_range -> min ;
int maxLimit = this_range -> max ;
int flag = false;
while (minLimit <= maxLimit) {
i = 2;
int lim = ceil(sqrt(minLimit));
while (i <= lim) {
if (minLimit % i == 0){
flag = true;
break;
}
i++;
}
if (!flag){
primes++;
}
flag = false;
minLimit++;
}
this_range ->total = primes;
pthread_exit(NULL);
}
int main (int argc, char *argv[])
{
struct timespec start, finish;
double elapsed;
clock_gettime(CLOCK_MONOTONIC, &start);
pthread_t threads[NUM_THREADS];
struct prime_range pr[NUM_THREADS];
int rc;
pthread_attr_t attr;
void *status;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for(int t=1; t<= NUM_THREADS; t++){
pr[t].min = (t-1) * THREAD_LOAD + 1;
pr[t].max = t*THREAD_LOAD;
rc = pthread_create(&threads[t], NULL, findPrime,(void *)&pr[t]);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
int totalPrimesFound = 0;
// free attribute and wait for the other threads
pthread_attr_destroy(&attr);
for(int t=1; t<= NUM_THREADS; t++){
rc = pthread_join(threads[t], &status);
if (rc) {
printf("Error:unable to join, %d" ,rc);
exit(-1);
}
totalPrimesFound += pr[t].total;
}
clock_gettime(CLOCK_MONOTONIC, &finish);
elapsed = (finish.tv_sec - start.tv_sec);
elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
printf("This machine calculated all %d prime numbers under %d in %lf seconds\n",totalPrimesFound, NUM_THREADS*THREAD_LOAD, elapsed);
pthread_exit(NULL);
}
Notice how the entire CPU is used up -
P.S. - If you increase no of threads then actual CPU usage go down (Try making no of threads = 20 .) as the system uses more time in context switching than actual computing.
PS - 如果增加线程数,则实际 CPU 使用率会下降(尝试使线程数为 20 。)因为系统在上下文切换中使用的时间比实际计算时间多。
By the way, my machine is not as beefy as @mystical (Accepted answer). But my version with basic POSIX threading works way faster than OMP one. Here is the result -
顺便说一句,我的机器不如@mystical(接受的答案)那么强大。但是我的带有基本 POSIX 线程的版本比 OMP 版本快得多。这是结果——
P.S. Increase threadload to 2.5 million to see CPU usage , as it completes in less than a second.
PS 将线程负载增加到 250 万以查看 CPU 使用率,因为它会在不到一秒的时间内完成。
回答by Nima Mohammadi
Simply try to Zip and Unzip a big file , nothing as a heavy I/o operations can use cpu.
只需尝试 Zip 和 Unzip 大文件,没有任何东西可以使用 CPU 进行繁重的 I/O 操作。