MYSQL 5.7 中的原生 JSON 支持:MYSQL 中 JSON 数据类型的优缺点是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33660866/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?
提问by Imran
In MySQL 5.7 a new data type for storing JSON data in MySQLtables has been added. It will obviously be a great change in MySQL. They listed some benefits
在 MySQL 5.7 中,添加了用于在 MySQL表中存储JSON 数据的新数据类型。显然,这将是 MySQL 的一个巨大变化。他们列出了一些好处
Document Validation- Only valid JSON documents can be stored in a JSON column, so you get automatic validation of your data.
Efficient Access- More importantly, when you store a JSON document in a JSON column, it is not stored as a plain text value. Instead, it is stored in an optimized binary format that allows for quicker access to object members and array elements.
Performance- Improve your query performance by creating indexes on values within the JSON columns. This can be achieved with “functional indexes” on virtual columns.
Convenience- The additional inline syntax for JSON columns makes it very natural to integrate Document queries within your SQL. For example (features.feature is a JSON column):
SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
文档验证- 只有有效的 JSON 文档才能存储在 JSON 列中,因此您可以自动验证数据。
高效访问- 更重要的是,当您将 JSON 文档存储在 JSON 列中时,它不会存储为纯文本值。相反,它以优化的二进制格式存储,允许更快地访问对象成员和数组元素。
性能- 通过为 JSON 列中的值创建索引来提高查询性能。这可以通过虚拟列上的“功能索引”来实现。
方便- JSON 列的附加内联语法使得在 SQL 中集成文档查询变得非常自然。例如(features.feature 是一个 JSON 列):
SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254;
WOW ! they include some great features. Now it is easier to manipulate data. Now it is possible to store more complex data in column. So MySQL is now flavored with NoSQL.
哇 !它们包括一些很棒的功能。现在更容易操作数据。现在可以在列中存储更复杂的数据。所以 MySQL 现在加入了 NoSQL。
Now I can imagine a query for JSON data something like
现在我可以想象对 JSON 数据的查询类似于
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN
(
SELECT JSON_EXTRACT(data,"$.inverted")
FROM t1 | {"series": 3, "inverted": 8}
WHERE JSON_EXTRACT(data,"$.inverted")<4 );
So can I store huge small relations in few json colum? Is it good? Does it break normalization. If this is possible then I guess it will act like NoSQL in a MySQL column. I really want to know more about this feature. Pros and cons of MySQL JSON data type.
那么我可以在几个 json 列中存储巨大的小关系吗?好吗?它是否打破了规范化。如果这是可能的,那么我想它会像 MySQL 列中的 NoSQL 一样。我真的很想了解更多有关此功能的信息。MySQL JSON 数据类型的优缺点。
采纳答案by Bill Karwin
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
在像这样的表达式或函数中使用列会破坏使用索引来帮助优化查询的查询机会。上面显示的查询被强制执行表扫描。
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
关于“有效访问”的说法具有误导性。这意味着在查询检查带有 JSON 文档的行后,它可以提取字段而无需解析 JSON 语法的文本。但是仍然需要进行表扫描来搜索行。换句话说,查询必须检查每一行。
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
以此类推,如果我在电话簿中搜索名字为“Bill”的人,我仍然需要阅读电话簿中的每一页,即使已突出显示名字以便更快地找到它们。
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
MySQL 5.7 允许您在表中定义一个虚拟列,然后在虚拟列上创建索引。
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
然后如果查询虚拟列,它可以使用索引并避免表扫描。
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
这很好,但它有点错过了使用 JSON 的意义。使用 JSON 的吸引力在于它允许您添加新属性而无需执行 ALTER TABLE。但事实证明,如果您想在索引的帮助下搜索 JSON 字段,无论如何您都必须定义一个额外的(虚拟)列。
But you don't have to define virtual columns and indexes for everyfield in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
但是您不必为JSON 文档中的每个字段定义虚拟列和索引——只需为您想要搜索或排序的字段定义。JSON 中可能还有其他属性,您只需在选择列表中提取,如下所示:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
我通常会说这是在 MySQL 中使用 JSON 的最佳方式。仅在选择列表中。
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
当您在其他子句(JOIN、WHERE、GROUP BY、HAVING、ORDER BY)中引用列时,使用常规列而不是 JSON 文档中的字段更有效。
I presented a talk called How to Use JSON in MySQL Wrongat the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
我在 2018 年 4 月的 Percona Live 会议上发表了一个名为How to Use JSON in MySQL Wrong 的演讲。我将在秋季的 Oracle Code One 上更新并重复演讲。
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
JSON 还存在其他问题。例如,在我的测试中,与存储相同数据的传统列相比,它需要 2-3 倍的 JSON 文档存储空间。
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
MySQL 正在积极推广其新的 JSON 功能,主要是为了劝阻人们不要迁移到 MongoDB。但是像 MongoDB 这样的面向文档的数据存储从根本上说是一种非关系的数据组织方式。它与关系不同。我并不是说一个比另一个更好,它只是一种不同的技术,适用于不同类型的查询。
You should choose to use JSON when JSON makes your queries more efficient.
当 JSON 使您的查询更高效时,您应该选择使用 JSON。
Don't choose a technology just because it's new, or for the sake of fashion.
不要仅仅因为它是新的或为了时尚而选择一项技术。
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following shoulduse the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
编辑:如果您的 WHERE 子句使用与虚拟列的定义完全相同的表达式,则 MySQL 中的虚拟列实现应该使用索引。也就是说,下面应该使用虚拟列上的索引,因为虚拟列是定义的AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions.
除了我通过测试这个功能发现,如果表达式是一个 JSON 提取函数,它由于某种原因不起作用。它适用于其他类型的表达式,但不适用于 JSON 函数。
回答by Drew
The following from MySQL 5.7 brings sexy back with JSONsounds good to me:
MySQL 5.7 中的以下内容通过 JSON 恢复了性感,对我来说听起来不错:
Using the JSON Data Type in MySQL comes with two advantages over storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and invalid documents will produce an error. Improved internal storage format. The JSON data is converted to a format that allows quick read access to the data in a structured format. The server is able to lookup subobjects or nested values by key or index, allowing added flexibility and performance.
与在文本字段中存储 JSON 字符串相比,在 MySQL 中使用 JSON 数据类型有两个优势:
数据验证。JSON 文档将被自动验证,无效的文档将产生错误。改进的内部存储格式。JSON 数据被转换为允许快速读取访问结构化格式数据的格式。服务器能够通过键或索引查找子对象或嵌套值,从而增加灵活性和性能。
...
...
Specialised flavours of NoSQL stores (Document DBs, Key-value stores and Graph DBs) are probably better options for their specific use cases, but the addition of this datatype might allow you to reduce complexity of your technology stack. The price is coupling to MySQL (or compatible) databases. But that is a non-issue for many users.
特定类型的 NoSQL 存储(文档数据库、键值存储和图形数据库)可能是它们特定用例的更好选择,但添加这种数据类型可能会让您降低技术堆栈的复杂性。代价是耦合到 MySQL(或兼容)数据库。但这对许多用户来说不是问题。
Note the language about document validationas it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
请注意有关文档验证的语言,因为它是一个重要因素。我想需要进行一系列测试来比较这两种方法。那两个是:
- Mysql with JSON datatypes
- Mysql without
- 带有 JSON 数据类型的 Mysql
- mysql 没有
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
从我所看到的情况来看,截至目前,网络上关于 mysql / json / 性能的主题只有浅薄的幻灯片。
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.
也许您的帖子可以成为它的中心。或者也许性能是事后的想法,不确定,你只是很高兴没有创建一堆表。
回答by Bruce
I got into this problem recently, and I sum up the following experiences:
最近陷入了这个问题,总结了以下经验:
1, There isn't a way to solve all questions. 2, You should use the JSON properly.
1、没有办法解决所有问题。2,你应该正确使用JSON。
One case:
一种情况:
I have a table named: CustomField
, and it must two columns: name
, fields
.
name
is a localized string, it content should like:
我有一个名为: 的表CustomField
,它必须有两列:name
, fields
。
name
是一个本地化的字符串,它的内容应该是:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields
should be like this:
而且fields
应该是这样的:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name
and the fields
can be saved as JSON, and it works!
如您所见, thename
和 thefields
都可以保存为 JSON,并且可以正常工作!
However, if I use the name
to search this table very frequently, what should I do? Use the JSON_CONTAINS
,JSON_EXTRACT
...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName
.
但是,如果我name
经常使用来搜索此表,我该怎么办?使用JSON_CONTAINS
, JSON_EXTRACT
...? 显然,再将其保存为 JSON 并不是一个好主意,我们应该将其保存到一个独立的表中:CustomFieldName
.
From the above case, I think you should keep these ideas in mind:
从上面的案例来看,我认为你应该牢记这些想法:
- Why MYSQL support JSON?
- Why you want to use JSON? Did your business logic just need this? Or there is something else?
- Never be lazy
- 为什么 MYSQL 支持 JSON?
- 为什么要使用 JSON?您的业务逻辑是否只需要这个?或者还有其他事情?
- 永远不要偷懒
Thanks
谢谢
回答by Vitalii
From my experience, JSON implementation at least in MySql 5.7 is not very useful due to its poor performance. Well, it is not so bad for reading data and validation. However, JSON modification is 10-20 times slower with MySql that with Python or PHP. Lets imagine very simple JSON:
根据我的经验,JSON 实现至少在 MySql 5.7 中由于其性能不佳而不是很有用。嗯,读取数据和验证并没有那么糟糕。但是,使用 MySql 修改 JSON 比使用 Python 或 PHP 慢 10-20 倍。让我们想象一个非常简单的 JSON:
{ "name": "value" }
Lets suppose we have to convert it to something like that:
假设我们必须将其转换为类似的内容:
{ "name": "value", "newName": "value" }
You can create simple script with Python or PHP that will select all rows and update them one by one. You are not forced to make one huge transaction for it, so other applications will can use the table in parallel. Of course, you can also make one huge transaction if you want, so you'll get guarantee that MySql will perform "all or nothing", but other applications will most probably not be able to use database during transaction execution.
您可以使用 Python 或 PHP 创建简单的脚本,该脚本将选择所有行并一一更新它们。您不会被迫为它进行一笔巨大的交易,因此其他应用程序可以并行使用该表。当然,如果你愿意,你也可以做一个巨大的事务,这样你就可以保证 MySql 将执行“全有或全无”,但其他应用程序很可能无法在事务执行期间使用数据库。
I have 40 millions rows table, and Python script updates it in 3-4 hours.
我有 4000 万行表,Python 脚本在 3-4 小时内更新它。
Now we have MySql JSON, so we don't need Python or PHP anymore, we can do something like that:
现在我们有了 MySql JSON,所以我们不再需要 Python 或 PHP,我们可以这样做:
UPDATE `JsonTable` SET `JsonColumn` = JSON_SET(`JsonColumn`, "newName", JSON_EXTRACT(`JsonColumn`, "name"))
It looks simple and excellent. However, its speed is 10-20 times slower than Python version, and it is single transaction, so other applications can not modify the table data in parallel.
它看起来简单而出色。但是它的速度比Python版本慢10-20倍,而且是单事务,所以其他应用无法并行修改表数据。
So, if we want to just duplicate JSON key in 40 millions rows table, we need to not use table at all during 30-40 hours. It has no sence.
所以,如果我们只想在 4000 万行表中复制 JSON 键,我们需要在 30-40 小时内根本不使用表。它没有任何意义。
About reading data, from my experience direct access to JSON field via JSON_EXTRACT
in WHERE
is also extremelly slow (much slower that TEXT
with LIKE
on not indexed column). Virtual generated columns perform much faster, however, if we know our data structure beforehand, we don't need JSON, we can use traditional columns instead. When we use JSON where it is really useful, i. e. when data structure is unknown or changes often (for example, custom plugin settings), virtual column creation on regular basis for any possible new columns doesn't look like good idea.
关于通过读取数据,从我的经验直接访问JSON领域JSON_EXTRACT
中WHERE
也extremelly慢(慢得多的是TEXT
与LIKE
上没有索引列)。虚拟生成列的执行速度要快得多,但是,如果我们事先知道我们的数据结构,我们就不需要 JSON,我们可以使用传统列来代替。当我们在 JSON 真正有用的地方使用它时,即当数据结构未知或经常更改时(例如,自定义插件设置),为任何可能的新列定期创建虚拟列看起来不是一个好主意。
Python and PHP make JSON validation like a charm, so it is questionable do we need JSON validation on MySql side at all. Why not also validate XML, Microsoft Office documents or check spelling? ;)
Python 和 PHP 使 JSON 验证成为一种魅力,因此我们是否需要在 MySql 端进行 JSON 验证值得怀疑。为什么不验证 XML、Microsoft Office 文档或检查拼写?;)