应该使用什么列类型在 mysql 数据库中存储序列化数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5544749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What column type should be used to store serialized data in a mysql db?
提问by djburdick
What column type should be used to store serialized data in a mysql db? I know you can use varbinary, blob, text. What's considered the best and why?
应该使用什么列类型在 mysql 数据库中存储序列化数据?我知道你可以使用 varbinary、blob、text。什么被认为是最好的,为什么?
Edit: I understand it is not "good" to store serialized data. I need to do it in this one case though. Please just trust me on this and focus on the question if you have an answer. Thanks!
编辑:我知道存储序列化数据不是“好”。不过,在这种情况下,我需要这样做。如果你有答案,请相信我,并专注于这个问题。谢谢!
回答by gaborous
To answer: text is deprecated in a lot of DBMS it seems, so better use either a blob or a varchar with a high limit (and with blob you won't get any encoding issues, which is a major hassle with varchar and text).
回答:文本似乎在很多 DBMS 中已被弃用,因此最好使用 blob 或具有高限制的 varchar(并且使用 blob,您不会遇到任何编码问题,这是 varchar 和 text 的主要麻烦) .
Also as pointed in this thread at the MySQL forums, hard-drives are cheaper than software, so you'd better first design your software and make it work, and only then if space becomes an issue, you may want to optimize that aspect. So don't try to overoptimize the size of your column too early on, better set the size larger at first (plus this will avoid security issues).
同样正如MySQL 论坛上的这个帖子所指出的,硬盘驱动器比软件便宜,所以你最好先设计你的软件并让它工作,只有当空间成为问题时,你才可能想要优化这方面。因此,不要过早尝试过度优化列的大小,最好一开始将大小设置得更大(另外这将避免安全问题)。
About the various comments: Too much SQL fanaticism here. Despite the fact that I am greatly fond of SQL and relational models, they also have their pitfalls.
关于各种评论:这里有太多的 SQL 狂热。尽管我非常喜欢 SQL 和关系模型,但它们也有其缺陷。
Storing serialized data into the database as-is (such as storing JSON or XML formatted data) has a few advantages:
将序列化数据按原样存储到数据库中(例如存储 JSON 或 XML 格式的数据)有几个优点:
- You can have a more flexible format for your data: adding and removing fields on the fly, changing the specification of the fields on the fly, etc...
- Less impedance mismatch with the object model: you store and you fetch the data just as it is in your program, compared to fetching the data and then having to process and convert it between your program objects' structures and your relational database's structures.
- 您可以为您的数据使用更灵活的格式:动态添加和删除字段,动态更改字段的规范等...
- 与对象模型的阻抗不匹配更少:与获取数据然后必须在程序对象的结构和关系数据库的结构之间处理和转换数据相比,您可以像在程序中一样存储和获取数据。
And there are a lot more other advantages, so please no fanboyism: relational databases are a great tool, but let's not dish the other tools we can get. More tools, the better.
还有更多其他优势,所以请不要狂热:关系数据库是一个很好的工具,但我们不要放弃我们可以获得的其他工具。工具越多越好。
As for a concrete example of use, I tend to add a JSON field in my database to store extra parameters of a record where the columns (properties) of the JSON data will never be SELECT'd individually, but only used when the right record is already selected. In this case, I can still discriminate my records with the relational columns, and when the right record is selected, I can just use the extra parameters for whatever purpose I want.
至于使用的具体例子,我倾向于在我的数据库中添加一个 JSON 字段来存储记录的额外参数,其中 JSON 数据的列(属性)永远不会被单独 SELECT ,而仅在正确的记录时使用已经被选中。在这种情况下,我仍然可以用关系列歧视我的记录,并且在选择右记录时,我可以只使用额外的参数来获取我想要的任何目的。
So my advice to retain the best of both world (speed, serializability and structural flexibility), just use a few standard relational columns to serve as unique keys to discriminate between your rows, and then use a blob/varchar column where your serialized data will be inserted. Usually, only two/three columns are required for a unique key, thus this won't be a major overhead.
所以我的建议是保留两全其美(速度、可序列化性和结构灵活性),只需使用一些标准关系列作为唯一键来区分您的行,然后使用 blob/varchar 列,您的序列化数据将被插入。通常,唯一键只需要两/三列,因此这不会是主要开销。
Also, you may be interested by PostgreSQL which now has a JSON datatype, and the PostSQL projectto directly process JSON fields just as relational columns.
此外,您可能对 PostgreSQL 感兴趣,它现在具有 JSON 数据类型,以及直接处理 JSON 字段的PostSQL 项目,就像关系列一样。
回答by Joseph Lust
How much do you plan to store? Check out the specs for the string types at the MySQL docsand their sizes. The key here is that you don't care about indexing this column, but you also never want it to overflow and get truncated, since then you JSON is unreadable.
你打算储存多少?在 MySQL 文档中查看字符串类型的规范及其大小。这里的关键是你不关心索引这个列,但你也不希望它溢出并被截断,因为那时你的 JSON 是不可读的。
- TINYTEXTL < 2^8
- TEXTL < 2^16
- MEDIUMTEXTL < 2^24
- LONGTEXTL < 2^32
- 小文本L < 2^8
- 文本L < 2^16
- 中文本L < 2^24
- 长文本L < 2^32
Where L is the length in character
其中 L 是字符长度
Just plain textshould be enough, but go bigger if you are storing more. Though, in that case, you might not want to be storing it in the db.
只需要纯文本就足够了,但是如果您要存储更多内容,请使用更大的文本。但是,在这种情况下,您可能不想将它存储在 db 中。
回答by Bill Karwin
The length limits that @Twisted Pear mentions are good reasons.
@Twisted Pear 提到的长度限制是很好的理由。
Also consider that TEXT
and its ilk have a charsetassociated with them, whereas BLOB
data types do not. If you're just storing raw bytes of data, you might as well use BLOB
instead of TEXT
.
还要考虑到TEXT
它及其同类具有与之关联的字符集,而BLOB
数据类型则没有。如果你只是存储数据的原始字节,你还不如用BLOB
代替TEXT
。
Note that you can still store textual data in a BLOB
, you just can't do any SQL operations on it that take charset into account; it's just bytes to SQL. But that's probably not an issue in your case, since it's serialized data with structure unknown to SQL anyway. All you need to do is store bytes and fetch bytes. The interpretation of the bytes is up to your app.
请注意,您仍然可以将文本数据存储在 a 中BLOB
,只是不能对其进行任何考虑字符集的 SQL 操作;它只是 SQL 的字节。但这在您的情况下可能不是问题,因为它是序列化数据,其结构无论如何 SQL 都不知道。您需要做的就是存储字节和获取字节。字节的解释取决于您的应用程序。
I have also had troubles using LONGBLOB
or LONGTEXT
using certain client libraries (e.g. PHP) because the client tries to allocate a buffer as large as the largest possible data type, not knowing how large the content will be on any given row until it's fetched. This caused PHP to burst into flames as it tried to allocate a 4GB buffer. I don't know what client you're using, or whether it suffers from the same behavior.
我在使用LONGBLOB
或LONGTEXT
使用某些客户端库(例如 PHP)时也遇到了麻烦,因为客户端尝试分配尽可能大的数据类型的缓冲区,在获取内容之前不知道任何给定行上的内容有多大。这导致 PHP 在尝试分配 4GB 缓冲区时爆发。我不知道你使用的是什么客户端,或者它是否有相同的行为。
The workaround: use MEDIUMBLOB
or just BLOB
, as long as those types are sufficient to store your serialized data.
解决方法:使用MEDIUMBLOB
或只是BLOB
,只要这些类型足以存储您的序列化数据。
On the issue of people telling you not to do this, I'm not going to tell you that (in spite of the fact that I'm an SQL advocate). It's true you can't use SQL expressions to perform operations on individual elements within the serialized data, but that's not your purpose. What you do gain by putting that data into the database includes:
关于人们告诉你不要这样做的问题,我不会告诉你(尽管我是 SQL 的倡导者)。确实,您不能使用 SQL 表达式对序列化数据中的单个元素执行操作,但这不是您的目的。通过将这些数据放入数据库,您获得的好处包括:
- Associate serialized data with other more relational data.
- Ability to store and fetch serialized data according to transaction scope, COMMIT, ROLLBACK.
- Store all your relational and non-relational data in one place, to make it easier to replicate to slaves, back up and restore, etc.
- 将序列化数据与其他更相关的数据相关联。
- 能够根据事务范围、COMMIT、ROLLBACK 存储和获取序列化数据。
- 将所有关系和非关系数据存储在一个地方,以便更轻松地复制到从站、备份和恢复等。
回答by superherogeek
LONGTEXT
长文
Wordpress stores serialized data in their postmeta table as LONGTEXT. I find the Wordpress database to be a good place to research datatypes for columns.
Wordpress 将序列化数据作为 LONGTEXT 存储在它们的 postmeta 表中。我发现 Wordpress 数据库是研究列数据类型的好地方。
回答by daneczech
I might be late to the party, but the php.net documentation about serialized object states the following:
我可能会迟到,但有关序列化对象的 php.net 文档说明如下:
Note that this is a binary string which may include null bytes, and needs to be stored and handled as such. For example, serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field.
请注意,这是一个可能包含空字节的二进制字符串,需要按原样进行存储和处理。例如,serialize() 输出通常应存储在数据库中的 BLOB 字段中,而不是 CHAR 或 TEXT 字段中。
Source: http://php.net/manual/en/function.serialize.php
来源:http: //php.net/manual/en/function.serialize.php
Hope that helps!
希望有帮助!
回答by igaster
As of MySQL 5.7.8, MySQL supports a native JSON data type: MySQL Manual
从 MySQL 5.7.8 开始,MySQL 支持原生 JSON 数据类型:MySQL 手册
回答by wallyk
Unless the serialized data has no other use than to be saved and restored from the database, you probably don't want to do it that way.
除非序列化数据除了从数据库中保存和恢复外别无他用,否则您可能不想这样做。
Typically, serialized data has several fields which should be stored in the database as separate columns. It is common for every item of serialized data to be a separate column. Some of those columns would naturally be key fields. Additional columns might plausibly added besides the data to indicate the date+time of when the insertion occurred, the responsible user, etc., etc.
通常,序列化数据具有多个字段,这些字段应作为单独的列存储在数据库中。序列化数据的每个项目都是一个单独的列是很常见的。其中一些列自然是关键字段。除了数据之外,可能还会添加额外的列,以指示插入发生的日期+时间、负责用户等。
回答by djburdick
I found:
我发现:
varchar(5000)
to be the best balance of size/speed for us. Also, it works with the rails 3 serialize data (varbinary) was throwing serialize errors intermittently.
对我们来说是规模/速度的最佳平衡。此外,它适用于 rails 3 序列化数据(varbinary)间歇性地抛出序列化错误。