在 MySQL 中存储 UUID v4
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43056220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Store UUID v4 in MySQL
提问by Stephen R
I'm generating UUIDs using PHP, per the function found here
根据此处找到的函数,我正在使用 PHP 生成 UUID
Now I want to store that in a MySQL database. What is the best/most efficient MySQL field format for storing UUID v4?
现在我想将其存储在 MySQL 数据库中。用于存储 UUID v4 的最佳/最有效的 MySQL 字段格式是什么?
I currently have varchar(256), but I'm pretty sure that's much larger than necessary. I've found lots of almost-answers, but they're generally ambiguous about what form of UUID they're referring to, so I'm asking for the specific format.
我目前有 varchar(256),但我很确定它比必要的要大得多。我找到了很多几乎答案,但他们通常对所指的 UUID 形式不明确,所以我要求提供特定格式。
回答by tadman
Store it as VARCHAR(36)
if you're looking to have an exact fit, or VARCHAR(255)
which is going to work out with the same storage cost anyway. There's no reason to fuss over bytes here.
将它存储起来就VARCHAR(36)
好像您想要完全合身一样,或者VARCHAR(255)
无论如何都可以使用相同的存储成本。没有理由在这里对字节大惊小怪。
Remember VARCHAR
fields are variable length, so the storage cost is proportional to how much data is actually in them, not how much data could be in them.
请记住,VARCHAR
字段是可变长度的,因此存储成本与其中实际包含的数据量成正比,而不是与其中可能包含的数据量成正比。
Storing it as BINARY
is extremely annoying, the values are unprintable and can show up as garbage when running queries. There's rarely a reason to use the literal binary representation. Human-readable values can be copy-pasted, and worked with easily.
将它存储为BINARY
非常烦人,这些值是不可打印的,并且在运行查询时可能会显示为垃圾。很少有理由使用文字二进制表示。人类可读的值可以复制粘贴,并且可以轻松使用。
Some other platforms, like Postgres, have a proper UUID column which stores it internally in a more compact format, but displays it as human-readable, so you get the best of both approaches.
其他一些平台,如 Postgres,有一个适当的 UUID 列,它以更紧凑的格式在内部存储它,但将其显示为人类可读的,因此您可以充分利用这两种方法。
回答by Mathieu Rey
If you always have a UUID for each row, you could store it as CHAR(36)
and save 1 byte per row over VARCHAR(36)
.
如果每行总是有一个 UUID,则可以将其存储为CHAR(36)
并且每行节省 1 个字节VARCHAR(36)
。
uuid CHAR(36) CHARACTER SET ascii
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. https://dev.mysql.com/doc/refman/5.7/en/char.html
与 CHAR 相比,VARCHAR 值存储为 1 字节或 2 字节长度的前缀加数据。长度前缀表示值中的字节数。如果值需要不超过 255 个字节,则列使用一个长度字节,如果值可能需要超过 255 个字节,则使用两个长度字节。 https://dev.mysql.com/doc/refman/5.7/en/char.html
Though be careful with CHAR
, it will always consume the full length defined even if the field is left empty. Also, make sure to use ASCII for character set, as CHAR
would otherwise plan for worst case scenario (i.e. 3 bytes per character in utf8
, 4 in utf8mb4
)
尽管要小心CHAR
,但即使该字段为空,它也将始终消耗定义的全长。此外,请确保使用 ASCII 作为字符集,CHAR
否则会为最坏的情况做计划(即每个字符 3 个字节 in utf8
,4 in utf8mb4
)
[...] MySQL must reserve four bytes for each character in a CHAR CHARACTER SET utf8mb4 column because that is the maximum possible length. For example, MySQL must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column. https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
[...] MySQL 必须为 CHAR CHARACTER SET utf8mb4 列中的每个字符保留四个字节,因为这是可能的最大长度。例如,MySQL 必须为 CHAR(10) CHARACTER SET utf8mb4 列保留 40 个字节。 https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
回答by Karsten R.
Question is about storing an UUID in MySQL.
问题是关于在 MySQL 中存储 UUID。
Since version 8.0 of mySQL you can use binary(16)
with automatic conversion via UUID_TO_BIN/BIN_TO_UUID
functions:
https://mysqlserverteam.com/mysql-8-0-uuid-support/
从 mySQL 8.0 版开始,您可以binary(16)
通过UUID_TO_BIN/BIN_TO_UUID
函数进行自动转换:https:
//mysqlserverteam.com/mysql-8-0-uuid-support/
Be aware that mySQL has also a fast way to generate UUIDs as primary key:
请注意,mySQL 还有一种快速生成 UUID 作为主键的方法:
INSERT INTO t VALUES(UUID_TO_BIN(UUID(), true))
插入 t 值(UUID_TO_BIN(UUID(),真))
回答by Walf
Most efficient is definitely BINARY(16)
, storing the human-readable characters uses over double the storage space, and means bigger indices and slower lookup. If your data is small enough that storing as them as text doesn't hurt performance, you probably don't need UUIDs over boring integer keys. Storing raw is really not as painful as others suggest because any decent db admin tool will display/dump the octets as hexadecimal, rather than literal bytes of "text". You shouldn't need to be looking up UUIDs manually in the db; if you have to, HEX()
and x'deadbeef01'
literals are your friends. It is trivial to write a function in your app – like the one you referenced – to deal with this for you. You could probably even do it in the database as virtual columns and stored procedures so the app never bothers with the raw data.
最有效的肯定是BINARY(16)
,存储人类可读的字符使用两倍以上的存储空间,这意味着更大的索引和更慢的查找。如果您的数据足够小以至于将它们存储为文本不会影响性能,那么您可能不需要在无聊的整数键上使用 UUID。存储原始数据确实没有其他人建议的那么痛苦,因为任何像样的数据库管理工具都会将八位字节显示/转储为十六进制,而不是“文本”的字面字节。您不需要在数据库中手动查找 UUID;如果你有,HEX()
和x'deadbeef01'
文字是你的朋友。在你的应用程序中编写一个函数——就像你引用的那个——来为你处理这个问题是微不足道的。
I would separate the UUID generation logic from the display logic to ensure that existing data are never changed and errors are detectable:
我会将 UUID 生成逻辑与显示逻辑分开,以确保现有数据永远不会更改并且错误是可检测的:
function guidv4($prettify = false)
{
static $native = function_exists('random_bytes');
$data = $native ? random_bytes(16) : openssl_random_pseudo_bytes(16);
$data[6] = chr(ord($data[6]) & 0x0f | 0x40); // set version to 0100
$data[8] = chr(ord($data[8]) & 0x3f | 0x80); // set bits 6-7 to 10
if ($prettify) {
return guidv4_pretty($data);
}
return $data;
}
function guidv4_pretty($data)
{
return strlen($data) == 16 ?
vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4)) :
false;
}
function guidv4_ugly($data)
{
$data = preg_replace('/[^\dA-F]+/i', '', $data);
return strlen($data) == 32 ? hex2bin($data) : false;
}
Edit:If you only need the column pretty when reading the database, a statement like the following is sufficient:
编辑:如果您在读取数据库时只需要漂亮的列,则如下语句就足够了:
ALTER TABLE test ADD uuid_pretty CHAR(36) GENERATED ALWAYS AS (CONCAT_WS('-', LEFT(HEX(uuid_ugly), 8), SUBSTR(HEX(uuid_ugly), 9, 4), SUBSTR(HEX(uuid_ugly), 13, 4), SUBSTR(HEX(uuid_ugly), 17, 4), RIGHT(HEX(uuid_ugly), 12))) VIRTUAL;
回答by Timo
The most space-efficient would be BINARY(16)
or two BIGINT UNSIGNED
.
最节省空间的将是一BINARY(16)
两个BIGINT UNSIGNED
。
The former might give you headaches because manual queries do not (in a straightforward way) give you readable/copyable values. The latter might give you headaches because of having to map between one value and two columns.
前者可能会让您头疼,因为手动查询不会(以直接的方式)为您提供可读/可复制的值。后者可能会让您头疼,因为必须在一个值和两列之间进行映射。
If this is a primary key, I would definitely not waste any space on it, as it becomes part of every secondary index as well. In other words, I would choose one of these types.
如果这是一个主键,我绝对不会在上面浪费任何空间,因为它也成为每个二级索引的一部分。换句话说,我会选择这些类型之一。
For performance, the randomness of random UUIDs (i.e. UUID v4, which is randomized) will hurt severely. This applies when the UUID is your primary key or if you do a lot of range queries on it. Your insertions into the primary index will be all over the place rather than all at (or near) the end. Your data loses temporal locality, which was a helpful property in various cases.
对于性能来说,随机UUID(即UUID v4,它是随机的)的随机性会受到严重的伤害。这适用于 UUID 是您的主键或您对其进行大量范围查询的情况。您对主索引的插入将无处不在,而不是全部位于(或接近)末尾。您的数据失去了时间局部性,这在各种情况下都是有用的属性。
My main improvement would be to use something similar to a UUID v1, which uses a timestamp as part of its data, and ensure that the timestamp is in the highest bits. For example, the UUID might be composed something like this:
我的主要改进是使用类似于 UUID v1 的东西,它使用时间戳作为其数据的一部分,并确保时间戳位于最高位。例如,UUID 可能是这样组成的:
Timestamp | Machine Identifier | Counter
This way, we get a locality similar to auto-increment values.
这样,我们得到一个类似于自动增量值的位置。
回答by Mathieu Rey
I just found a nice article going in more depth on these topics: https://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/
我刚刚找到了一篇关于这些主题的更深入的好文章:https: //www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on- mysql/
It covers the storage of values, with the same options already expressed in the different answers on this page:
它涵盖了值的存储,在本页的不同答案中已经表达了相同的选项:
- One: watch out for character set
- Two: use fixed-length, non-nullable values
- Three: Make it BINARY
- 一:注意字符集
- 二:使用固定长度、不可为空的值
- 三:使其成为二进制
But also adds some interesting insight about indexes:
但也增加了一些关于索引的有趣见解:
- Four: use prefix indexes
- 四:使用前缀索引
In many but not all cases, you don't need to index the full length of the value. I usually find that the first 8 to 10 characters are unique. If it's a secondary index, this is generally good enough. The beauty of this approach is that you can apply it to existing applications without any need to modify the column to BINARY or anything else—it's an indexing-only change and doesn't require the application or the queries to change.
在许多但并非所有情况下,您不需要索引值的全长。我通常发现前 8 到 10 个字符是唯一的。如果是二级索引,这通常就足够了。这种方法的美妙之处在于您可以将其应用于现有应用程序,而无需将列修改为 BINARY 或其他任何内容——它只是索引更改,不需要更改应用程序或查询。
Note that the article doesn't tell you how to create such a "prefix" index. Looking at MySQL documentation for Column Indexeswe find:
请注意,本文没有告诉您如何创建这样的“前缀”索引。查看列索引的MySQL 文档,我们发现:
[...] you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
[...] the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB).
[...] 您可以创建仅使用列的前 N 个字符的索引。以这种方式仅索引列值的前缀可以使索引文件更小。为 BLOB 或 TEXT 列建立索引时,必须为索引指定前缀长度。例如:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
[...] CREATE TABLE、ALTER TABLE 和 CREATE INDEX 语句中的前缀长度被解释为非二进制字符串类型(CHAR、VARCHAR、TEXT)的字符数和二进制字符串类型(BINARY、VARBINARY、BLOB)的字节数)。
- Five: build hash indexes
- 五:构建哈希索引
What you can do is generate a checksum of the values and index that. That's right, a hash-of-a-hash. For most cases, CRC32() works pretty well (if not, you can use a 64-bit hash function). Create another column. [...] The CRC column isn't guaranteed to be unique, so you need both criteria in the WHERE clause or this technique won't work. Hash collisions happen quickly; you will probably get a collision with about 100k values, which is much sooner than you might think—don't assume that a 32-bit hash means you can put 4 billion rows in your table before you get a collision.
您可以做的是生成值和索引的校验和。没错,一个散列的散列。在大多数情况下,CRC32() 工作得很好(如果不是,您可以使用 64 位哈希函数)。创建另一个列。[...] 不能保证 CRC 列是唯一的,因此您需要 WHERE 子句中的两个条件,否则此技术将不起作用。哈希冲突发生得很快;您可能会遇到大约 10 万个值的冲突,这比您想象的要快得多——不要假设 32 位哈希意味着您可以在发生冲突之前在表中放入 40 亿行。