UUID 作为 PostgreSQL 中的主键会导致索引性能不佳吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13145988/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Will UUID as primary key in PostgreSQL give bad index performance?
提问by thejaz
I have created an app in Rails on Heroku using a PostgreSQL database.
我使用 PostgreSQL 数据库在 Heroku 上的 Rails 中创建了一个应用程序。
It has a couple of tables designed to be able to sync with mobile devices where data can be created on different places. Therefor I have a uuid field that is a string storing a GUID in addition to an auto increment primary key. The uuid is the one that is communicated between the server and the clients.
它有几个表,旨在能够与可以在不同位置创建数据的移动设备同步。因此,我有一个 uuid 字段,它是一个字符串,除了自动增量主键外,还存储 GUID。uuid 是在服务器和客户端之间进行通信的那个。
I realised after implementing the sync engine on the server side that this leads to performance issues when needing to map between uuid<->id all the time (when writing objects, I need to query for the uuid to get the id before saving and the opposite when sending back data).
在服务器端实现同步引擎后,我意识到这会导致性能问题,当需要一直在 uuid<->id 之间进行映射时(写入对象时,我需要在保存之前查询 uuid 以获取 id 和发回数据时相反)。
I'm now thinking about switching to only using UUID as primary key making the writing and reading much simpler and faster.
我现在正在考虑切换到仅使用 UUID 作为主键,使写入和读取变得更加简单和快捷。
I've read that UUID as primary key can sometimes give bad index performance (index fragmentation) when using clustered primary key index. Does PostgreSQL suffer from this problem or is it OK to use UUID as primary key?
我读过 UUID 作为主键在使用聚集主键索引时有时会导致索引性能不佳(索引碎片)。PostgreSQL 是否存在这个问题,还是可以使用 UUID 作为主键?
I already have a UUID column today so storage wise it will be better because I drop the regular id column.
我今天已经有了一个 UUID 列,所以在存储方面它会更好,因为我删除了常规的 id 列。
采纳答案by hgmnz
(I work on Heroku Postgres)
(我在 Heroku Postgres 上工作)
We use UUIDs as primary keys on a few systems and it works great.
我们在一些系统上使用 UUID 作为主键,效果很好。
I recommend you use the uuid-ossp
extension, and even have postgres generate UUIDs for you:
我建议你使用这个uuid-ossp
扩展,甚至让 postgres 为你生成 UUID:
heroku pg:psql
psql (9.1.4, server 9.1.6)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.
dcvgo3fvfmbl44=> CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION
dcvgo3fvfmbl44=> CREATE TABLE test (id uuid primary key default uuid_generate_v4(), name text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
dcvgo3fvfmbl44=> \d test
Table "public.test"
Column | Type | Modifiers
--------+------+-------------------------------------
id | uuid | not null default uuid_generate_v4() name | text |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
dcvgo3fvfmbl44=> insert into test (name) values ('hgmnz');
INSERT 0 1
dcvgo3fvfmbl44=> select * from test;
id | name
--------------------------------------+-------
e535d271-91be-4291-832f-f7883a2d374f | hgmnz
(1 row)
EDIT performance implications
编辑性能影响
It will alwaysdepend on your workload.
这将始终取决于您的工作量。
The integer primary key has the advantage of locality where like-data sits closer together. This can be helpful for eg: range type queries such as WHERE id between 1 and 10000
although lock contention is worse.
整数主键具有局部性的优势,其中类似的数据更靠近在一起。这对于例如范围类型的查询很有帮助,例如WHERE id between 1 and 10000
虽然锁争用更糟。
If your read workload is totally random in that you always make primary key lookups, there shouldn't be any measurable performance degradation: you only pay for the larger data type.
如果您的读取工作负载是完全随机的,因为您总是进行主键查找,则不应有任何可测量的性能下降:您只需为较大的数据类型付费。
Do you write a lot to this table, and is this table very big? It's possible, although I haven't measured this, that there are implications in maintaining that index. For lots of datasets UUIDs are just fine though, and using UUIDs as identifiers has some nice properties.
你在这张桌子上写了很多东西,这张桌子很大吗?尽管我尚未对此进行测量,但有可能维持该指数存在影响。不过对于很多数据集来说,UUID 就可以了,使用 UUID 作为标识符有一些很好的特性。
Finally, I may not be the most qualified person to discuss or advice on this, as I have never run a table large enough with a UUID PK where it has become a problem. YMMV. (Having said that, I'd love to hear of people who run into problems with the approach!)
最后,我可能不是最有资格对此进行讨论或提出建议的人,因为我从未运行过足够大的带有 UUID PK 的表,但它已成为问题。天啊。(话虽如此,我很想听听遇到这种方法问题的人!)
回答by catamphetamine
As the accepted answer states, range queries may be slow in this case, but not only on id
.
正如接受的答案所述,在这种情况下,范围查询可能会很慢,但不仅限于id
。
Autoincrement is naturally sorted by date, so when autoincrement is used the data is stored chronologically on disk (see B-Tree) which speeds up reads (no seeking for HDDs). For example, if one lists all the users the natural order would be by date created which is the same as autoincrement and so range queries execute faster on HDDs while on SSD, i guess, the difference would be nonexistent since SSDs are by design always random access (no head seeking, no mechanical parts involved, just pure electricity)
自动增量自然是按日期排序的,所以当使用自动增量时,数据按时间顺序存储在磁盘上(参见 B 树),这加快了读取速度(不寻找 HDD)。例如,如果列出所有用户,自然顺序将按创建日期排序,这与自动增量相同,因此范围查询在 HDD 上执行得更快,而在 SSD 上,我猜,差异将不存在,因为 SSD 设计总是随机的访问(不寻头,不涉及机械部件,只是纯电力)