postgresql 使用十进制数据类型(MySQL / Postgres)是否会影响性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20884405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:15:07  来源:igfitidea点击:

Is there a performance hit using decimal data types (MySQL / Postgres)

mysqldatabasepostgresqltypestype-conversion

提问by wobbily_col

I understand how integer and floating point data types are stored, and I am guessing that the variable length of decimal data types means it is stored more like a string.

我了解整数和浮点数据类型是如何存储的,我猜测十进制数据类型的可变长度意味着它更像是一个字符串。

Does that imply a performance overhead when using a decimal data type and searching against them?

当使用十进制数据类型并搜索它们时,这是否意味着性能开销?

回答by Craig Ringer

Pavel has it quite right, I'd just like to explain a little.

帕维尔说得很对,我只想解释一下。

Presuming that you mean a performance impact as compared to floating point, or fixed-point-offset integer (i.e. storing thousandsths of a cent as an integer): Yes, there is very much a performance impact. PostgreSQL, and by the sounds of things MySQL, store DECIMAL/ NUMERICin binary-coded decimal. This format is more compact than storing the digits as text, but it's still not very efficient to work with.

假设您的意思是与浮点数或定点偏移整数(即,将千分之一美分存储为整数)相比的性能影响:是的,性能影响非常大。PostgreSQL,和 MySQL 一样,以二进制编码的十进制存储DECIMAL/ NUMERIC。这种格式比将数字存储为文本更紧凑,但使用起来仍然不是很有效。

If you're not doing many calculations in the database, the impact is limited to the greater storage space requried for BCD as compared to integer or floating point, and thus the wider rows and slower scans, bigger indexes, etc. Comparision operations in b-tree index searches are also slower, but not enough to matter unless you're already CPU-bound for some other reason.

如果您没有在数据库中进行很多计算,则影响仅限于 BCD 与整数或浮点数相比需要更大的存储空间,因此行更宽,扫描速度更慢,索引更大等。 b 中的比较操作-tree 索引搜索也较慢,但还不够重要,除非您已经因其他原因受 CPU 限制。

If you're doing lots of calculations with the DECIMAL/ NUMERICvalues in the database, then performance can really suffer. This is particularly noticeable, at least in PostgreSQL, because Pg can't use more than one CPU for any given query. If you're doing a huge bunch of division & multiplication, more complex maths, aggregation, etc on numerics you can start to find yourself CPU-bound in situations where you would never be when using a float or integer data type. This is particularly noticeable in OLAP-like (analytics) workloads, and in reporting or data transformation during loading or extraction (ETL).

如果您使用数据库中的DECIMAL/ NUMERIC值进行大量计算,那么性能可能会受到影响。这一点尤其明显,至少在 PostgreSQL 中是这样,因为对于任何给定的查询,Pg 不能使用多个 CPU。如果您正在对数字进行大量的除法和乘法运算、更复杂的数学运算、聚合等,您可能会开始发现自己在使用浮点数或整数数据类型时永远不会受到 CPU 限制的情况下。这在类似 OLAP 的(分析)工作负载以及加载或提取 (ETL) 期间的报告或数据转换中尤为明显。

Despite the fact that there isa performance impact (which varies based on workload from negligible to quite big) you should generally use numeric/ decimalwhen it is the most appropriate type for your task - i.e. when very high range values must be stored and/or rounding error isn't acceptable.

尽管有一个性能的影响(其变化的基础上,从微不足道的工作量相当大),你通常应该使用numeric/decimal时,它是最合适的类型,你的任务-即在非常高的范围值必须存储和/或倒圆错误是不可接受的。

Occasionally it's worth the hassle of using a bigint and fixed-point offset, but that is clumsy and inflexible. Using floating point instead is very rarely the right answer due to all the challenges of working reliably with floating point values for things like currency.

有时,使用 bigint 和定点偏移的麻烦是值得的,但这是笨拙且不灵活的。使用浮点数很少是正确的答案,因为可靠地使用浮点值来处理货币等事物的所有挑战。

(BTW, I'm quite excited that some new Intel CPUs, and IBM's Power 7 range of CPUs, include hardware support for IEEE 754 decimal floating point. If this ever becomes available in lower end CPUs it'll be a huge win for databases.)

(顺便说一句,我很高兴一些新的 Intel CPU 和 IBM 的 Power 7 系列 CPU,包括对 IEEE 754 十进制浮点的硬件支持。如果这在低端 CPU 中可用,那将是数据库的巨大胜利.)

回答by Pavel Stehule

A impact of decimal type (Numeric type in Postgres) depends on usage. For typical OLTP this impact could not be significant - for OLAP can be relative high. In our application a aggregation on large columns with numeric is more times slower than for type double precision.

十进制类型(Postgres 中的数字类型)的影响取决于使用情况。对于典型的 OLTP,这种影响可能并不显着 - 对于 OLAP 可能相对较高。在我们的应用程序中,对带有数字的大列进行聚合比双精度类型慢很多倍。

Although a current CPU are strong, still is rule - you should to use a Numeric only when you need exact numbers or very high numbers. Elsewhere use float or double precision type.

尽管当前的 CPU 很强大,但仍然是规则 - 只有在需要精确数字或非常高的数字时才应该使用数字。在其他地方使用浮点数或双精度类型。

回答by eggyal

You are correct: fixed-point data is stored as a (packed BCD) string.

您是对的:定点数据存储为(压缩 BCD)字符串。

To what extent this impacts performance depends on a range of factors, which include:

这在多大程度上影响性能取决于一系列因素,其中包括:

  1. Do queries utilise an index upon the column?

  2. Can the CPU perform BCD operations in hardware, such as through Intel's BCD opcodes?

  3. Does the OS harness hardware support through library functions?

  1. 查询是否在列上使用索引?

  2. CPU 能否在硬件中执行 BCD 操作,例如通过Intel 的 BCD 操作码

  3. 操作系统是否通过库函数利用硬件支持?

Overall, any performance impact is likely to be pretty negligable relative to other factors that you may face: so don't worry about it. Remember Knuth's maxim, "premature optimisation is the root of all evil".

总体而言,相对于您可能面临的其他因素,任何性能影响可能都可以忽略不计:所以不要担心。记住 Knuth 的格言,“过早的优化是万恶之源”。

回答by Rahul

I am guessing that the variable length of decimal data types means it is stored more like a string.

我猜测十进制数据类型的可变长度意味着它更像是一个字符串。

Taken from MySql document here

取自此处的MySql 文档

The document says

文件说

as of MySQL 5.0.3 Values for DECIMAL columns no longer are represented as strings that require 1 byte per digit or sign character. Instead, a binary format is used that packs nine decimal digits into 4 bytes. This change to DECIMAL storage format changes the storage requirements as well. The storage requirements for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires 4 bytes, and any remaining digits require some fraction of 4 bytes.

从 MySQL 5.0.3 开始,DECIMAL 列的值不再表示为每个数字或符号字符需要 1 个字节的字符串。相反,使用二进制格式将九个十进制数字打包成 4 个字节。这种对 DECIMAL 存储格式的更改也改变了存储要求。每个值的整数部分和小数部分的存储要求是单独确定的。九位数字的每个倍数需要 4 个字节,任何剩余的数字都需要 4 个字节的一部分。