C++ 内置类型的性能:char vs short vs int vs. float vs. double

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5069489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 17:19:11  来源:igfitidea点击:

Performance of built-in types : char vs short vs int vs. float vs. double

c++cperformancebuilt-in

提问by Nawaz

This may appear to be a bit stupid question but seeing Alexandre C's replyin the other topic, I'm curious to know that if there is any performance difference with the built-in types:

这似乎是一个有点愚蠢的问题,但看到 Alexandre C在另一个主题中的回复,我很想知道内置类型是否有任何性能差异:

charvs shortvs intvs. floatvs. double.

charvs shortvs vs intvs floatvs double.

Usually we don't consider such performance difference (if any) in our real life projects, but I would like to know this for educational purpose. The general questions can be asked is:

通常我们在现实生活中的项目中不会考虑这种性能差异(如果有的话),但我想知道这是出于教育目的。可以问的一般问题是:

  • Is there any performance difference between integral arithmetics and floating-point arithmetic?

  • Which is faster? What is the reason for being faster? Please explain this.

  • 积分算术和浮点算术之间有什么性能差异吗?

  • 哪个更快?更快的原因是什么?请解释一下。

回答by Stephen Canon

Float vs. integer:

浮点数与整数:

Historically, floating-point could be much slower than integer arithmetic. On modern computers, this is no longer really the case (it is somewhat slower on some platforms, but unless you write perfect code and optimize for every cycle, the difference will be swamped by the other inefficiencies in your code).

从历史上看,浮点运算可能比整数运算慢得多。在现代计算机上,情况不再如此(在某些平台上它会稍微慢一些,但除非您编写完美的代码并针对每个周期进行优化,否则代码中的其他低效率将淹没差异)。

On somewhat limited processors, like those in high-end cell phones, floating-point may be somewhat slower than integer, but it's generally within an order of magnitude (or better), so long as there is hardware floating-point available. It's worth noting that this gap is closing pretty rapidly as cell phones are called on to run more and more general computing workloads.

在某些有限的处理器上,如高端手机中的处理器,浮点可能比整数慢一些,但它通常在一个数量级(或更好)内,只要有可用的硬件浮点。值得注意的是,随着手机被要求运行越来越多的通用计算工作负载,这一差距正在迅速缩小。

On verylimited processors (cheap cell phones and your toaster), there is generally no floating-point hardware, so floating-point operations need to be emulated in software. This is slow -- a couple orders of magnitude slower than integer arithmetic.

非常有限的处理器(便宜的手机和烤面包机)上,通常没有浮点硬件,因此需要在软件中模拟浮点运算。这很慢——比整数算术慢几个数量级。

As I said though, people are expecting their phones and other devices to behave more and more like "real computers", and hardware designers are rapidly beefing up FPUs to meet that demand. Unless you're chasing every last cycle, or you're writing code for very limited CPUs that have little or no floating-point support, the performance distinction doesn't matter to you.

正如我所说,人们希望他们的手机和其他设备越来越像“真正的计算机”,硬件设计人员正在迅速加强 FPU 以满足这种需求。除非您正在追逐每个最后一个周期,或者您正在为很少或没有浮点支持的非常有限的 CPU 编写代码,否则性能差异对您来说无关紧要。

Different size integer types:

不同大小的整数类型:

Typically, CPUsare fastest at operating on integers of their native word size (with some caveats about 64-bit systems). 32 bit operations are often faster than 8- or 16- bit operations on modern CPUs, but this varies quite a bit between architectures. Also, remember that you can't consider the speed of a CPU in isolation; it's part of a complex system. Even if operating on 16-bit numbers is 2x slower than operating on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you represent it with 16-bit numbers instead of 32-bits. If that makes the difference between having all your data come from cache instead of taking frequent cache misses, then the faster memory access will trump the slower operation of the CPU.

通常,CPU在处理其本机字大小的整数时速度最快(对 64 位系统有一些警告)。在现代 CPU 上,32 位操作通常比 8 位或 16 位操作快,但是这在架构之间差异很大。另外,请记住,您不能孤立地考虑 CPU 的速度;它是复杂系统的一部分。即使操作 16 位数字比操作 32 位数字慢 2 倍,当您用 16 位数字而不是 32 位数字表示它时,您可以将两倍的数据放入缓存层次结构中。如果这让您的所有数据都来自缓存而不是频繁发生缓存未命中有所不同,那么更快的内存访问将胜过 CPU 的较慢操作。

Other notes:

其他注意事项:

Vectorization tips the balance further in favor of narrower types (floatand 8- and 16-bit integers) -- you can do more operations in a vector of the same width. However, good vector code is hard to write, so it's not as though you get this benefit without a lot of careful work.

向量化进一步平衡了更窄的类型(float以及 8 位和 16 位整数)——您可以在相同宽度的向量中执行更多操作。然而,好的向量代码很难编写,所以如果没有大量的细心工作,你就不会获得这种好处。

Why are there performance differences?

为什么会有性能差异?

There are really only two factors that effect whether or not an operation is fast on a CPU: the circuit complexity of the operation, and user demand for the operation to be fast.

CPU上的运算速度快慢实际上只有两个因素:运算的电路复杂度,以及用户对运算速度的要求。

(Within reason) any operation can be made fast, if the chip designers are willing to throw enough transistors at the problem. But transistors cost money (or rather, using lots of transistors makes your chip larger, which means you get fewer chips per wafer and lower yields, which costs money), so chip designers have to balance how much complexity to use for which operations, and they do this based on (perceived) user demand. Roughly, you might think of breaking operations into four categories:

(在合理范围内)任何操作都可以快速进行,只要芯片设计者愿意在问题上投入足够多的晶体管。但是晶体管是要花钱的(或者更确切地说,使用大量晶体管会使你的芯片更大,这意味着每个晶圆上的芯片更少,产量更低,这会花钱),因此芯片设计人员必须权衡使用哪些操作的复杂度,以及他们根据(感知到的)用户需求来做到这一点。粗略地说,您可能会考虑将操作分为四类:

                 high demand            low demand
high complexity  FP add, multiply       division
low complexity   integer add            popcount, hcf
                 boolean ops, shifts

high-demand, low-complexity operations will be fast on nearly any CPU: they're the low-hanging fruit, and confer maximum user benefit per transistor.

几乎所有 CPU 上的高需求、低复杂性操作都将很快:它们是唾手可得的果实,并赋予每个晶体管最大的用户利益。

high-demand, high-complexity operations will be fast on expensive CPUs (like those used in computers), because users are willing to pay for them. You're probably not willing to pay an extra $3 for your toaster to have a fast FP multiply, however, so cheap CPUs will skimp on these instructions.

高需求、高复杂性的操作在昂贵的 CPU(如计算机中使用的 CPU)上会很快,因为用户愿意为它们付费。您可能不愿意为您的烤面包机支付额外的 3 美元以获得快速的 FP 乘法,但是,因此廉价的 CPU 会忽略这些指令。

low-demand, high-complexity operations will generally be slow on nearly all processors; there just isn't enough benefit to justify the cost.

几乎所有处理器上的低需求、高复杂性操作通常都很慢;只是没有足够的好处来证明成本是合理的。

low-demand, low-complexity operations will be fast if someone bothers to think about them, and non-existent otherwise.

如果有人费心考虑,低需求、低复杂性的操作将会很快,否则就不存在。

Further reading:

进一步阅读:

  • Agner Fog maintains a nice websitewith lots of discussion of low-level performance details (and has very scientific data collection methodology to back it up).
  • The Intel? 64 and IA-32 Architectures Optimization Reference Manual(PDF download link is part way down the page) covers a lot of these issues as well, though it is focused on one specific family of architectures.
  • Agner Fog 维护了一个很好的网站,其中包含大量关于低级性能细节的讨论(并且有非常科学的数据收集方法来支持它)。
  • 英特尔?64 和 IA-32 架构优化参考手册(PDF 下载链接位于页面下方)也涵盖了很多这些问题,尽管它专注于一个特定的架构系列。

回答by jalf

Absolutely.

绝对地。

First, of course, it depends entirely on the CPU architecture in question.

首先,当然,它完全取决于所讨论的 CPU 架构。

However, integral and floating-point types are handled very differently, so the following is nearly always the case:

但是,整数和浮点类型的处理方式非常不同,因此几乎总是以下情况:

  • for simple operations, integral types are fast. For example, integer addition often has only a single cycle's latency, and integer multiplication is typically around 2-4 cycles, IIRC.
  • Floating point types used to perform much slower. On today's CPUs, however, they have excellent throughput, and a each floating point unit can usually retire an operation per cycle, leading to the same (or similar) throughput as for integer operations. However, latency is generally worse. Floating-point addition often has a latency around 4 cycles (vs 1 for ints).
  • for some complex operations, the situation is different, or even reversed. For example, division on FP may have lesslatency than for integers, simply because the operation is complex to implement in both cases, but it is more commonly useful on FP values, so more effort (and transistors) may be spent optimizing that case.
  • 对于简单的操作,整数类型是fast。例如,整数加法通常只有一个周期的延迟,而整数乘法通常约为 2-4 个周期,IIRC。
  • 浮点类型过去的执行速度要慢得多。然而,在今天的 CPU 上,它们具有出色的吞吐量,并且每个浮点单元通常可以在每个周期退出一个操作,从而导致与整数运算相同(或相似)的吞吐量。但是,延迟通常更糟。浮点加法通常有大约 4 个周期的延迟(而 int 为 1)。
  • 对于一些复杂的操作,情况就不同了,甚至颠倒过来。例如,FP 上的除法可能比整数的延迟更短,这仅仅是因为在这两种情况下该运算的实现都很复杂,但它在 FP 值上更常用,因此可能会花费更多的精力(和晶体管)来优化这种情况。

On some CPUs, doubles may be significantly slower than floats. On some architectures, there is no dedicated hardware for doubles, and so they are handled by passing two float-sized chunks through, giving you a worse throughput and twice the latency. On others (the x86 FPU, for example), both types are converted to the same internal format 80-bit floating point, in the case of x86), so performance is identical. On yet others, both float and double have proper hardware support, but because float has fewer bits, it can be done a bit faster, typically reducing the latency a bit relative to double operations.

在某些 CPU 上,double 可能比 float 慢得多。在某些体系结构上,没有用于双精度的专用硬件,因此通过传递两个浮点大小的块来处理它们,从而为您提供更差的吞吐量和两倍的延迟。在其他(例如 x86 FPU)上,两种类型都转换为相同的内部格式 80 位浮点,在 x86 的情况下),因此性能相同。在另外一些情况下,float 和 double 都有适当的硬件支持,但由于 float 的位数较少,因此可以更快地完成,通常相对于 double 操作减少了一点延迟。

Disclaimer: all the mentioned timings and characteristics are just pulled from memory. I didn't look any of it up, so it may be wrong. ;)

免责声明:所有提到的时间和特性都是从记忆中提取的。我没有仔细看,所以可能是错的。;)

For different integer types, the answer varies wildly depending on CPU architecture. The x86 architecture, due to its long convoluted history, has to support both 8, 16, 32 (and today 64) bit operations natively, and in general, they're all equally fast ( they use basically the same hardware, and just zero out the upper bits as needed).

对于不同的整数类型,答案因 CPU 架构而异。x86 架构,由于其漫长而复杂的历史,必须原生支持 8、16、32(以及今天的 64)位操作,并且总的来说,它们都同样快(它们使用基本相同的硬件,并且只是零根据需要取出高位)。

However, on other CPUs, datatypes smaller than an intmay be more costly to load/store (writing a byte to memory might have to be done by loading the entire 32-bit word it is located in, and then do bit masking to update the single byte in a register, and then write the whole word back). Likewise, for datatypes larger than int, some CPUs may have to split the operation into two, loading/storing/computing the lower and upper halves separately.

但是,在其他 CPU 上,小于 an 的数据类型的int加载/存储成本可能更高(可能必须通过加载它所在的整个 32 位字来将一个字节写入内存,然后进行位掩码以更新寄存器中的单个字节,然后写回整个字)。同样,对于大于 的数据类型int,某些 CPU 可能不得不将操作拆分为两个,分别加载/存储/计算下半部分和上半部分。

But on x86, the answer is that it mostly doesn't matter. For historical reasons, the CPU is required to have pretty robust support for each and every data type. So the only difference you're likely to notice is that floating-point ops have more latency (but similar throughput, so they're not slowerper se, at least if you write your code correctly)

但是在 x86 上,答案是它基本上无关紧要。由于历史原因,CPU 需要为每种数据类型提供非常强大的支持。因此,您可能会注意到的唯一区别是浮点运算具有更多延迟(但吞吐量相似,因此它们本身并不会变慢,至少在您正确编写代码的情况下)

回答by Lundin

I don't think anyone mentioned the integer promotion rules. In standard C/C++, no operation can be performed on a type smaller than int. If char or short happen to be smaller than int on the current platform, they are implicitly promoted to int (which is a major source of bugs). The complier is required to do this implicit promotion, there's no way around it without violating the standard.

我认为没有人提到整数提升规则。在标准 C/C++ 中,不能对小于int. 如果在当前平台上 char 或 short 恰好小于 int ,它们将被隐式提升为 int (这是错误的主要来源)。编译器需要进行这种隐式提升,在不违反标准的情况下没有办法绕过它。

The integer promotions mean that no operation (addition, bitwise, logical etc etc) in the language can occur on a smaller integer type than int. Thus, operations on char/short/int are generally equally fast, as the former ones are promoted to the latter.

整数提升意味着语言中的任何操作(加法、按位、逻辑等)都不能在比 int 更小的整数类型上发生。因此,对 char/short/int 的操作通常同样快,因为前者被提升到后者。

And on top of the integer promotions, there's the "usual arithmetic conversions", meaning that C strives to make both operands the same type, converting one of them to the larger of the two, should they be different.

在整数提升之上,还有“通常的算术转换”,这意味着 C 努力使两个操作数具有相同的类型,如果它们不同,则将它们之一转换为两者中较大的一个。

However, the CPU can perform various load/store operations on 8, 16, 32 etc level. On 8- and 16 bit architectures, this often means that 8 and 16 bit types are faster despite the integer promotions. On a 32 bit CPU it might actually mean that the smaller types are slower, because it wants to have everything neatly aligned in 32-bit chunks. 32 bit compilers typically optimize for speed and allocate smaller integer types in larger space than specified.

但是,CPU 可以在 8、16、32 等级别上执行各种加载/存储操作。在 8 位和 16 位体系结构上,这通常意味着尽管进行了整​​数提升,但 8 位和 16 位类型更快。在 32 位 CPU 上,它实际上可能意味着较小的类型更慢,因为它希望将所有内容整齐地排列在 32 位块中。32 位编译器通常会优化速度并在比指定的空间更大的空间中分配更小的整数类型。

Though generally the smaller integer types of course take less space than the larger ones, so if you intend to optimize for RAM size, they are to prefer.

尽管通常较小的整数类型当然比较大的类型占用更少的空间,因此如果您打算优化 RAM 大小,它们将更受欢迎。

回答by Researcher

The first answer above is great and I copied a small block of it across to the following duplicate (as this is where I ended up first).

上面的第一个答案很棒,我将它的一小块复制到以下副本中(因为这是我首先结束的地方)。

Are "char" and "small int" slower than "int"?

“char”和“small int”比“int”慢吗?

I'd like to offer the following code which profiles allocating, initializing and doing some arithmetic on the various integer sizes:

我想提供以下代码,它分析了对各种整数大小的分配、初始化和一些算术:

#include <iostream>

#include <windows.h>

using std::cout; using std::cin; using std::endl;

LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;

void inline showElapsed(const char activity [])
{
    QueryPerformanceCounter(&EndingTime);
    ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
    ElapsedMicroseconds.QuadPart *= 1000000;
    ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
    cout << activity << " took: " << ElapsedMicroseconds.QuadPart << "us" << endl;
}

int main()
{
    cout << "Hallo!" << endl << endl;

    QueryPerformanceFrequency(&Frequency);

    const int32_t count = 1100100;
    char activity[200];

    //-----------------------------------------------------------------------------------------//
    sprintf_s(activity, "Initialise & Set %d 8 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    int8_t *data8 = new int8_t[count];
    for (int i = 0; i < count; i++)
    {
        data8[i] = i;
    }
    showElapsed(activity);

    sprintf_s(activity, "Add 5 to %d 8 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    for (int i = 0; i < count; i++)
    {
        data8[i] = i + 5;
    }
    showElapsed(activity);
    cout << endl;
    //-----------------------------------------------------------------------------------------//

    //-----------------------------------------------------------------------------------------//
    sprintf_s(activity, "Initialise & Set %d 16 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    int16_t *data16 = new int16_t[count];
    for (int i = 0; i < count; i++)
    {
        data16[i] = i;
    }
    showElapsed(activity);

    sprintf_s(activity, "Add 5 to %d 16 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    for (int i = 0; i < count; i++)
    {
        data16[i] = i + 5;
    }
    showElapsed(activity);
    cout << endl;
    //-----------------------------------------------------------------------------------------//

    //-----------------------------------------------------------------------------------------//    
    sprintf_s(activity, "Initialise & Set %d 32 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    int32_t *data32 = new int32_t[count];
    for (int i = 0; i < count; i++)
    {
        data32[i] = i;
    }
    showElapsed(activity);

    sprintf_s(activity, "Add 5 to %d 32 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    for (int i = 0; i < count; i++)
    {
        data32[i] = i + 5;
    }
    showElapsed(activity);
    cout << endl;
    //-----------------------------------------------------------------------------------------//

    //-----------------------------------------------------------------------------------------//
    sprintf_s(activity, "Initialise & Set %d 64 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    int64_t *data64 = new int64_t[count];
    for (int i = 0; i < count; i++)
    {
        data64[i] = i;
    }
    showElapsed(activity);

    sprintf_s(activity, "Add 5 to %d 64 bit integers", count);
    QueryPerformanceCounter(&StartingTime);

    for (int i = 0; i < count; i++)
    {
        data64[i] = i + 5;
    }
    showElapsed(activity);
    cout << endl;
    //-----------------------------------------------------------------------------------------//

    getchar();
}


/*
My results on i7 4790k:

Initialise & Set 1100100 8 bit integers took: 444us
Add 5 to 1100100 8 bit integers took: 358us

Initialise & Set 1100100 16 bit integers took: 666us
Add 5 to 1100100 16 bit integers took: 359us

Initialise & Set 1100100 32 bit integers took: 870us
Add 5 to 1100100 32 bit integers took: 276us

Initialise & Set 1100100 64 bit integers took: 2201us
Add 5 to 1100100 64 bit integers took: 659us
*/

My results in MSVC on i7 4790k:

我在 i7 4790k 上的 MSVC 结果:

Initialise & Set 1100100 8 bit integers took: 444us
Add 5 to 1100100 8 bit integers took: 358us

初始化和设置 1100100 8 位整数花费:444us
将 5 添加到 1100100 8 位整数花费:358us

Initialise & Set 1100100 16 bit integers took: 666us
Add 5 to 1100100 16 bit integers took: 359us

初始化和设置 1100100 16 位整数占用:666us
将 5 添加到 1100100 16 位整数占用:359us

Initialise & Set 1100100 32 bit integers took: 870us
Add 5 to 1100100 32 bit integers took: 276us

初始化和设置 1100100 32 位整数花费:870us
将 5 添加到 1100100 32 位整数花费:276us

Initialise & Set 1100100 64 bit integers took: 2201us
Add 5 to 1100100 64 bit integers took: 659us

初始化和设置 1100100 64 位整数花费:2201us
将 5 添加到 1100100 64 位整数花费:659us

回答by Reed Copsey

Is there any performance difference between integral arithmetics and floating-point arithmetic?

积分算术和浮点算术之间有什么性能差异吗?

Yes. However, this is very much platform and CPU specific. Different platforms can do different arithmetic operations at different speeds.

是的。但是,这在很大程度上取决于平台和 CPU。不同的平台可以以不同的速度进行不同的算术运算。

That being said, the reply in question was a bit more specific. pow()is a general purpose routine that works on double values. By feeding it integer values, it's still doing all of the work that would be required to handle non-integer exponents. Using direct multiplication bypasses a lot of the complexity, which is where the speed comes into play. This is really not an issue (so much) of different types, but rather of bypassing a large amount of complex code required to make pow function with any exponent.

话虽如此,有问题的答复更具体一些。 pow()是一个适用于双值的通用例程。通过为它提供整数值,它仍在完成处理非整数指数所需的所有工作。使用直接乘法绕过了很多复杂性,这就是速度发挥作用的地方。这实际上不是不同类型的问题(这么多),而是绕过使用任何指数生成 pow 函数所需的大量复杂代码。

回答by Thomas Matthews

Depends on the composition of the processor and platform.

取决于处理器和平台的组成。

Platforms that have a floating point coprocessor may be slower than integral arithmetic due to the fact that values have to be transferred to and from the coprocessor.

具有浮点协处理器的平台可能比积分算法慢,因为值必须传入和传出协处理器。

If floating point processing is within the core of the processor, the execution time may be negligible.

如果浮点处理在处理器核心内,则执行时间可以忽略不计。

If the floating point calculations are emulated by software, then integral arithmetic will be faster.

如果浮点计算由软件模拟,那么积分运算会更快。

When in doubt, profile.

如有疑问,请配置文件。

Get the programming working correctly and robust before optimizing.

在优化之前让编程正确且稳健。

回答by Puppy

No, not really. This of course depends on CPU and compiler, but the performance difference is typically negligible- if there even is any.

不,不是真的。这当然取决于 CPU 和编译器,但性能差异通常可以忽略不计——如果有的话。

回答by rubenvb

There is certainly a difference between floating point and integer arithmetic. Depending on the CPU's specific hardware and micro-instructions, you get different performance and/or precision. Good google terms for the precise descriptions (I don't know exactly either):

浮点运算和整数运算之间肯定存在差异。根据 CPU 的特定硬件和微指令,您可以获得不同的性能和/或精度。精确描述的良好谷歌术语(我也不确切知道):

FPU x87 MMX SSE

FPU x87 MMX SSE

With regards to the size of the integers, it is best to use the platform/architecture word size (or double that), which comes down to an int32_ton x86 and int64_ton x86_64. SOme processors might have intrinsic instructions that handle several of these values at once (like SSE (floating point) and MMX), which will speed up parallel additions or multiplications.

关于整数的大小,最好使用平台/架构字大小(或两倍),这归结为int32_tx86 和int64_tx86_64 上的一个。有些处理器可能具有同时处理多个这些值的内在指令(如 SSE(浮点)和 MMX),这将加速并行加法或乘法。

回答by KeithS

Generally, integer math is faster than floating-point math. This is because integer math involves simpler computations. However, in most operations we're talking about less than a dozen clocks. Not millis, micros, nanos, or ticks; clocks. The ones that happen between 2-3 billion times per second in modern cores. Also, since the 486 a lot of cores have a set of Floating-point Processing Units or FPUs, which are hard-wired to perform floating-point arithmetic efficiently, and often in parallel with the CPU.

通常,整数数学比浮点数学更快。这是因为整数数学涉及更简单的计算。然而,在大多数操作中,我们谈论的时钟少于十二个。不是毫秒、微米、纳米或滴答声;时钟。在现代内核中每秒发生 2-30 亿次。此外,由于 486 很多内核都有一组浮点处理单元或 FPU,它们被硬连接以有效地执行浮点运算,并且通常与 CPU 并行。

As a result of these, though technically it's slower, floating-point calculations are still so fast that any attempt to time the difference would have more error inherent in the timing mechanism and thread scheduling than it actually takes to perform the calculation. Use ints when you can, but understand when you can't, and don't worry too much about relative calculation speed.

因此,尽管从技术上讲它更慢,但浮点计算仍然如此之快,以至于任何对差异进行计时的尝试都会导致计时机制和线程调度中固有的错误比实际执行计算所需的错误更多。能用的时候用整数,不能用的时候要明白,相对的计算速度不用太担心。