常用 C++ 优化技术列表

Question

提问by yoitsfrancis

Can I have a great list of common C++ optimization practices?

我可以有一个很好的常见 C++ 优化实践列表吗？

What I mean by optimization is that you have to modify the source code to be able to run a program faster, not changing the compiler settings.

我所说的优化是指您必须修改源代码才能更快地运行程序，而不是更改编译器设置。

Answer 1

采纳答案by Sesh

Two ways to write better programs:

编写更好程序的两种方法：

Make best use of language

充分利用语言

Code Complete by Steve McConnell
Effective C++
Exceptional C++

Steve McConnell 完成的代码
有效的 C++
出色的 C++

profile your application

配置您的应用程序

Identify what areas of code are taking how much time
See if you can use better data structures/ algorithms to make things faster

确定哪些代码区域占用了多少时间
看看你是否可以使用更好的数据结构/算法来加快速度

There is not much language specific optimization one can do - it is limited to using language constructs (learn from #1). The main benefit comes from #2 above.

没有多少特定于语言的优化可以做 - 它仅限于使用语言结构（从 #1 中学习）。主要好处来自上面的#2。

Answer 2

回答by plinth

I will echo what others have said: a better algorithm is going to win in terms of performance gains.

我将重复其他人所说的：更好的算法将在性能提升方面获胜。

That said, I work in image processing, which as a problem domain can be stickier. For example, many years ago I had a chunk of code that looked like this:

也就是说，我从事图像处理，这作为一个问题域可能会更棘手。例如，多年前，我有一段代码如下所示：

void FlipBuffer(unsigned char *start, unsigned char *end)
{
    unsigned char temp;

    while (start <= end) {
        temp = _bitRev[*start];
        *start++ = _bitRev[*end];
        *end-- = temp;
    }
 }

which rotates a 1-bit frame buffer 180 degrees. _bitRev is a 256 byte table of reversed bits. This code is about as tight as you can get it. It ran on an 8MHz 68K laser printer controller and took roughly 2.5 seconds for a legal sized piece of paper. To spare you the details, the customer couldn't bear 2.5 seconds. The solution was an identical algorithm to this. The difference was that

它将 1 位帧缓冲区旋转 180 度。_bitRev 是一个 256 字节的反转位表。这段代码非常紧凑。它在 8MHz 68K 激光打印机控制器上运行，打印一张合法尺寸的纸大约需要 2.5 秒。为了省去细节，客户无法忍受 2.5 秒。解决方案是与此相同的算法。不同之处在于

I used a 128K table and operated on words instead of bytes (the 68K is much happier on words)
I used Duff's device to unroll the loop as much as would fit within a short branch
I put in an optimization to skip blank words
I finally rewrote it in assembly to take advantage of the sobgtr instruction (subtract one and branch on greater) and have "free" post increment and pre-decrements in the right places.

我使用了一个 128K 的表，并对字而不是字节进行操作（68K 对字更满意）
我使用 Duff 的设备来展开循环尽可能多地适合短分支
我进行了优化以跳过空白词
我最终在汇编中重写了它，以利用 sobgtr 指令（减去一并分支更大），并在正确的位置拥有“自由”的后增量和预减量。

So 5x: no algorithm change.

所以 5x：没有算法改变。

The point is that you also need to understand your problem domain and what bottlenecks means. In image processing, algorithm is still king, but if your loops are doing extra work, multiply that work by several million and that's the price you pay.

关键是您还需要了解您的问题域以及瓶颈意味着什么。在图像处理中，算法仍然是王道，但如果您的循环做了额外的工作，则将该工作乘以几百万，这就是您要付出的代价。

Answer 3

回答by bayda

Don't forget about few things:
- "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (c) Donald Knuth
- We could get more if we will optimize algorithms than code.
- We will optimize only slow parts of existingcode, which will be detected by profiler or other special tool.

不要忘记一些事情：
-“我们应该忘记小效率，大约 97% 的时间：过早的优化是万恶之源。” (c) Donald Knuth
- 如果我们优化算法而不是代码，我们可以获得更多。
- 我们将仅优化现有代码中较慢的部分，这些部分将被分析器或其他特殊工具检测到。

Answer 4

回答by Rapha?l Saint-Pierre

Agner Fog has done a great job analyzing the output of several compilers regarding C++ constructs. You will find his work here: http://www.agner.org/optimize/.

Agner Fog 在分析多个编译器关于 C++ 结构的输出方面做得非常出色。您可以在这里找到他的作品：http: //www.agner.org/optimize/。

Intel offers a great document too - the "Intel? 64 and IA-32 Architectures Optimization Reference Manual", which you will find at http://www.intel.com/products/processor/manuals/index.htm. Although it mainly targets IA-32 architectures, it contains general advice that can be applied on most platforms. Obviously, it and Agner Fog's guide do intersect a bit.

Intel 也提供了一份很棒的文档 - “Intel? 64 and IA-32 Architectures Optimization Reference Manual”，您可以在http://www.intel.com/products/processor/manuals/index.htm找到该手册。尽管它主要针对 IA-32 体系结构，但它包含可应用于大多数平台的一般建议。显然，它和 Agner Fog 的指南确实有些交叉。

As mentioned in other answers, micro-optimization is obviously the last step you want take to make your program faster, after profiling and algorithm selection.

正如其他答案中所提到的，在分析和算法选择之后，微优化显然是您希望使程序更快的最后一步。

Answer 5

回答by sivabudh

You might be interested in this: Optimizing C++ Wikibook

您可能对此感兴趣：优化 C++ Wikibook

Answer 6

回答by Jordan Parmer

I don't have a site off the top of my head but the book "Exceptional C++" by Sutter is superb for C/C++ development. I highly recommend every C++ programmer read this book as it gives great insight in to not only optimization but smart usage of the language so that you will program truly exceptional.

我的脑海中没有一个网站，但 Sutter 的“Exceptional C++”一书非常适合 C/C++ 开发。我强烈推荐每个 C++ 程序员阅读这本书，因为它不仅对优化而且对语言的巧妙使用提供了深刻的见解，以便您将编程真正出色。

Answer 7

回答by Pete Kirkham

It is common in other engineering disciplines to assign budgets to the components of the system. For example, the engines of a VTOL aircraft are designed to provide a certain amount of lift, so the weight must be within a limit. At a high level, each part of the aircraft is given a portion of the weight budget which it should meet.

在其他工程学科中，为系统的组件分配预算是很常见的。例如，VTOL 飞机的发动机设计为提供一定量的升力，因此重量必须在限制范围内。在高层次上，飞机的每个部分都被分配了它应该满足的重量预算的一部分。

The reason this is done top down, rather than waiting until it's too bloated to get off the deck and then weighing each part and filing a bit off of the heaviest bit, is partly due to the cost of changing fabricated components. But a large part of it is that if you create a system where everything is a bit over budget everywhere, you can't just fix it in one place.

这是自上而下完成的，而不是等到它太臃肿而无法离开甲板，然后称重每个零件并从最重的钻头上锉掉一点的原因，部分原因是更换制造组件的成本。但很大一部分原因是，如果您创建的系统在任何地方都有点超出预算，那么您不能只在一个地方修复它。

The classic software example is the SGI Indy Irix 5.1, which partly is why graphics intensive users have Macs and Windows machines now rather than SGI boxes.

经典的软件示例是SGI Indy Irix 5.1，这部分是图形密集型用户现在拥有 Mac 和 Windows 机器而不是 SGI 机器的部分原因。

"What's most frightening about the 5.1 performance is that nobody knows exactly where it went. If you start asking around, you get plenty of finger-pointing and theories, but few facts. In the May report, I proposed a "5% theory", which states that each little thing we add (Motif, internationalization, drag-and-drop, DSOs, multiple fonts, and so on) costs roughly 5% of the machine. After 15 or 20 of these, most of the performance is gone."

“关于 5.1 性能最可怕的是没有人知道它到底去了哪里。如果你开始四处打听，你会得到很多指责和理论，但很少有事实。在 5 月的报告中，我提出了一个“5% 理论” ，这表明我们添加的每一个小东西（Motif、国际化、拖放、DSO、多种字体等）大约花费机器的 5%。其中 15 或 20 个之后，大部分性能都消失了.”

Frequently in discussions of performance, 5% is is said to be insignificant, and the advice is to wait until there is a problem and then look for a single bottleneck. For a large system, waiting until you have a problem may just lose you your main business.

经常在讨论性能时，说 5% 是微不足道的，建议是等到出现问题再寻找单个瓶颈。对于大型系统，等到出现问题时可能只会失去主要业务。

Answer 8

回答by Mike Dunlavey

You asked for sites/sources containing optimization wisdom.

您要求提供包含优化智慧的网站/来源。

Some good ones have been suggested.

已经推荐了一些好的。

I might add that they will nearly all say that profiling is the best if not the only way to locate performance problems.

我可能会补充说，他们几乎都会说分析是最好的，如果不是定位性能问题的唯一方法。

I'm not sure where this folk-wisdom originated or how it was justified, but there is a better way.

我不确定这种民间智慧的起源或如何证明它是合理的，但有一个更好的方法。

ADDED:

添加：

It is true that the "wrong algorithm" can kill performance, but that's certainly not the only way.

“错误的算法”确实会降低性能，但这肯定不是唯一的方法。

I do a lot of performance tuning. On large software, what usually kills performance is too much data structure and too many layers of abstraction.

我做了很多性能调优。在大型软件上，通常会影响性能的是过多的数据结构和过多的抽象层。

What seem like innocent one-liner method calls to abstract objects tempt you to forget what that call could cost you. Multiply this tendency over several layers of abstraction, and you find things like, for example, spending all your time allocating and collecting things like iterators and collection classes when simple arrays with indexing would have been sufficient (and no less maintainable) but less "proper".

对抽象对象的看似无害的单行方法调用会诱使您忘记该调用可能会使您付出什么代价。将这种趋势乘以多个抽象层，您会发现，例如，当带有索引的简单数组已经足够（并且不会降低可维护性）但不太“正确”时，您会发现花费所有时间来分配和收集诸如迭代器和集合类之类的东西”。

That's the problem with "common wisdom". It often is exactly the opposite of wisdom.

这就是“常识”的问题。它往往与智慧正好相反。

Answer 9

回答by dmityugov

++p is usually faster than p++ and --p is faster than p--, especially for objects of types with overloaded prefix and postfix increment and decrement operators, because prefix form just increments or decrements something and returns the new value, whereas the postfix form increments or decrements something, but has to keep the old value somewhere to return it. That is, instead of (replace int with your favorite class here)

++p 通常比 p++ 快，而 --p 比 p-- 快，特别是对于具有重载前缀和后缀自增和自减运算符的类型的对象，因为前缀形式只是增加或减少某些东西并返回新值，而后缀形式增加或减少某些东西，但必须将旧值保留在某处才能返回它。也就是说，而不是（在这里用你最喜欢的类替换 int ）

for ( int i ( 0); i < x; i++)

always write

总是写

for ( int i ( 0); i < x; ++i)

Answer 10

回答by T.E.D.

Most techniques are compiler-specific, as different compilers optimize differently.

大多数技术是特定于编译器的，因为不同的编译器优化不同。

If you want some optimization tips that are compiler agnostic, here are two for you:

如果您想要一些与编译器无关的优化技巧，这里有两个给您：

Don't do it.
(for experts only!): Don't do it yet.

不要这样做。
（仅限专家！）：先不要这样做。

(apologies to Michael A. Hymanson)

（向迈克尔·A·Hyman逊道歉）

常用 C++ 优化技术列表

提问by yoitsfrancis

采纳答案by Sesh

回答by plinth

回答by bayda

回答by Rapha?l Saint-Pierre

回答by sivabudh

回答by Jordan Parmer

回答by Pete Kirkham

回答by Mike Dunlavey

回答by dmityugov

回答by T.E.D.

相关推荐

最近更新

标签

常用 C++ 优化技术列表

提问by yoitsfrancis

采纳答案by Sesh

回答by plinth

回答by bayda

回答by Rapha?l Saint-Pierre

回答by sivabudh

回答by Jordan Parmer

回答by Pete Kirkham

回答by Mike Dunlavey

回答by dmityugov

回答by T.E.D.

相关推荐

C++ 使用 && 和 || 有什么区别 在 do...while 循环中？

C++ 在windows下获取实际屏幕dpi/ppi

C++ 模板，未定义的引用

C++ 初始值设定项列表不适用于 Visual Studio 2012 中的向量？

相关推荐

最近更新

标签

C++ 使用 && 和 || 有什么区别在 do...while 循环中？