MySQL 将 JSON 存储在数据库中与为每个键创建一个新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15367696/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:52:50  来源:igfitidea点击:

Storing JSON in database vs. having a new column for each key

mysqlsqlsql-serverdatabaseoptimization

提问by ShuklaSannidhya

I am implementing the following model for storing user related data in my table - I have 2 columns - uid(primary key) and a metacolumn which stores other data about the user in JSON format.

我正在实现以下模型,用于在我的表中存储用户相关数据 - 我有 2 列 - uid(主键)和一meta列以 JSON 格式存储有关用户的其他数据。

 uid   | meta
--------------------------------------------------
 1     | {name:['foo'], 
       |  emailid:['[email protected]','[email protected]']}
--------------------------------------------------
 2     | {name:['sann'], 
       |  emailid:['[email protected]','[email protected]']}
--------------------------------------------------

Is this a better way (performance-wise, design-wise) than the one-column-per-property model, where the table will have many columns like uid, name, emailid.

这是比每属性一列模型更好的方法(性能方面,设计方面),其中表将具有许多列,例如uid, name, emailid

What I like about the first model is, you can add as many fields as possible there is no limitation.

我喜欢第一个模型的是,您可以添加尽可能多的字段,没有限制。

Also, I was wondering, now that I have implemented the first model. How do I perform a query on it, like, I want to fetch all the users who have name like 'foo'?

另外,我想知道,现在我已经实现了第一个模型。我如何对其执行查询,例如,我想获取所有名称为“foo”的用户?

Question- Which is the better way to store user related data (keeping in mind that number of fields is not fixed) in database using - JSON or column-per-field? Also, if the first model is implemented, how to query database as described above? Should I use both the models, by storing all the data which may be searched by a query in a separate row and the other data in JSON (is a different row)?

问题- 使用 JSON 或 column-per-field 在数据库中存储用户相关数据(请记住字段数量不固定)的更好方法是什么?另外,如果实现了第一个模型,如何如上所述查询数据库?我是否应该同时使用这两种模型,将查询可能搜索到的所有数据存储在单独的行中,并将其他数据存储在 JSON 中(是不同的行)?



Update

更新

Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?

由于我需要执行搜索的列不会太多,因此使用这两种模型是否明智?我需要搜索的数据的每列键和其他人的 JSON(在同一个 MySQL 数据库中)?

回答by Colin M

Updated 4 June 2017

2017 年 6 月 4 日更新

Given that this question/answer have gained some popularity, I figured it was worth an update.

鉴于这个问题/答案已经获得了一些人气,我认为值得更新。

When this question was originally posted, MySQL had no support for JSON data types and the support in PostgreSQL was in its infancy. Since 5.7, MySQL now supports a JSON data type(in a binary storage format), and PostgreSQL JSONBhas matured significantly. Both products provide performant JSON types that can store arbitrary documents, including support for indexing specific keys of the JSON object.

最初发布此问题时,MySQL 不支持 JSON 数据类型,PostgreSQL 中的支持还处于起步阶段。从 5.7 开始,MySQL现在支持 JSON 数据类型(以二进制存储格式),并且 PostgreSQL JSONB已经显着成熟。这两种产品都提供可以存储任意文档的高性能 JSON 类型,包括支持索引 JSON 对象的特定键。

However, I still stand by my original statement that your default preference, when using a relational database, should still be column-per-value. Relational databases are still built on the assumption of that the data within them will be fairly well normalized. The query planner has better optimization information when looking at columns than when looking at keys in a JSON document. Foreign keys can be created between columns (but not between keys in JSON documents). Importantly: if the majority of your schema is volatile enough to justify using JSON, you might want to at least consider if a relational database is the right choice.

但是,我仍然坚持我最初的声明,即在使用关系数据库时,您的默认首选项仍然应该是 column-per-value。关系数据库仍然建立在假设其中的数据将被很好地规范化的基础上。与查看 JSON 文档中的键相比,查询计划器在查看列时具有更好的优化信息。可以在列之间创建外键(但不能在 JSON 文档中的键之间创建)。重要的是:如果您的大部分架构都足够易变以证明使用 JSON 是合理的,您可能至少要考虑一下关系数据库是否是正确的选择。

That said, few applications are perfectly relational or document-oriented. Most applications have some mix of both. Here are some examples where I personally have found JSON useful in a relational database:

也就是说,很少有应用程序是完全相关的或面向文档的。大多数应用程序都有一些混合。以下是我个人发现 JSON 在关系数据库中很有用的一些示例:

  • When storing email addresses and phone numbers for a contact, where storing them as values in a JSON array is much easier to manage than multiple separate tables

  • Saving arbitrary key/value user preferences (where the value can be boolean, textual, or numeric, and you don't want to have separate columns for different data types)

  • Storing configuration data that has no defined schema (if you're building Zapier, or IFTTT and need to store configuration data for each integration)

  • 在存储联系人的电子邮件地址和电话号码时,将它们作为值存储在 JSON 数组中比多个单独的表更易于管理

  • 保存任意键/值用户首选项(其中值可以是布尔值、文本或数字,并且您不希望不同数据类型有单独的列)

  • 存储没有定义架构的配置数据(如果您正在构建 Zapier 或 IFTTT 并且需要为每个集成存储配置数据)

I'm sure there are others as well, but these are just a few quick examples.

我相信还有其他的,但这些只是几个简单的例子。

Original Answer

原答案

If you really want to be able to add as many fields as you want with no limitation (other than an arbitrary document size limit), consider a NoSQL solution such as MongoDB.

如果您真的希望能够无限制地添加任意数量的字段(除了任意文档大小限制),请考虑使用 NoSQL 解决方案,例如 MongoDB。

For relational databases: use one column per value. Putting a JSON blob in a column makes it virtually impossible to query (and painfully slow when you actually find a query that works).

对于关系数据库:每个值使用一列。将一个 JSON blob 放在一列中几乎不可能进行查询(并且当你真正找到一个有效的查询时,速度会很慢)。

Relational databases take advantage of data types when indexing, and are intended to be implemented with a normalizedstructure.

关系数据库在索引时利用数据类型,并打算使用规范化结构来实现。

As a side note: this isn't to say you should never store JSON in a relational database. If you're adding true metadata, or if your JSON is describing information that does not need to be queriedand is only used for display, it may be overkill to create a separate column for all of the data points.

附带说明:这并不是说您永远不应该将 JSON 存储在关系数据库中。如果您要添加真正的元数据,或者如果您的 JSON 正在描述不需要查询且仅用于显示的信息,则为所有数据点创建单独的列可能会有些过分。

回答by Homan

Like most things "it depends". It's not right or wrong/good or bad in and of itself to store data in columns or JSON. It depends on what you need to do with it later. What is your predicted way of accessing this data? Will you need to cross reference other data?

像大多数事情一样“这取决于”。将数据存储在列或 JSON 中本身没有对错/好坏。这取决于您以后需要对它做什么。您预计访问这些数据的方式是什么?您是否需要交叉引用其他数据?

Other people have answered pretty well what the technical trade-off are.

其他人已经很好地回答了技术权衡是什么。

Not many people have discussed that your app and features evolve over time and how this data storage decision impacts your team.

没有多少人讨论过您的应用程序和功能会随着时间的推移而发展,以及此数据存储决策如何影响您的团队。

Because one of the temptations of using JSON is to avoid migrating schema and so if the team is not disciplined, it's very easy to stick yet another key/value pair into a JSON field. There's no migration for it, no one remembers what it's for. There is no validation on it.

因为使用 JSON 的一个诱惑是避免迁移模式,所以如果团队没有纪律,很容易将另一个键/值对粘贴到 JSON 字段中。它没有迁移,没有人记得它的用途。没有验证。

My team used JSON along side traditional columns in postgres and at first it was the best thing since sliced bread. JSON was attractive and powerful, until one day we realized that flexibility came at a cost and it's suddenly a real pain point. Sometimes that point creeps up really quickly and then it becomes hard to change because we've built so many other things on top of this design decision.

我的团队在 postgres 中将 JSON 与传统列一起使用,起初这是自切片面包以来最好的东西。JSON 既有吸引力又强大,直到有一天我们意识到灵活性是有代价的,它突然变成了一个真正的痛点。有时,这一点会很快出现,然后就很难改变了,因为我们在这个设计决策的基础上建立了很多其他的东西。

Overtime, adding new features, having the data in JSON led to more complicated looking queries than what might have been added if we stuck to traditional columns. So then we started fishing certain key values back out into columns so that we could make joins and make comparisons between values. Bad idea. Now we had duplication. A new developer would come on board and be confused? Which is the value I should be saving back into? The JSON one or the column?

随着时间的推移,添加新功能、将数据保存在 JSON 中会导致比我们坚持使用传统列可能添加的查询更复杂的查询。然后我们开始将某些键值提取回列中,以便我们可以进行连接并在值之间进行比较。馊主意。现在我们有重复。一个新的开发人员会加入并感到困惑吗?我应该存回哪个值?JSON 之一还是列?

The JSON fields became junk drawers for little pieces of this and that. No data validation on the database level, no consistency or integrity between documents. That pushed all that responsibility into the app instead of getting hard type and constraint checking from traditional columns.

JSON 字段变成了这些小块的垃圾抽屉。没有数据库级别的数据验证,文档之间没有一致性或完整性。这将所有责任推到了应用程序中,而不是从传统的列中进行硬类型和约束检查。

Looking back, JSON allowed us to iterate very quickly and get something out the door. It was great. However after we reached a certain team size it's flexibility also allowed us to hang ourselves with a long rope of technical debt which then slowed down subsequent feature evolution progress. Use with caution.

回顾过去,JSON 使我们能够非常快速地进行迭代并获得一些成果。太棒了。然而,当我们达到一定的团队规模后,它的灵活性也让我们陷入一长串技术债务中,从而拖慢了随后的功能演变进程。谨慎使用。

Think long and hard about what the nature of your data is. It's the foundation of your app. How will the data be used over time. And how is it likely TO CHANGE?

仔细考虑数据的性质。它是您的应用程序的基础。随着时间的推移,数据将如何使用。它有可能如何改变?

回答by Adam

Just tossing it out there, but WordPress has a structure for this kind of stuff (at least WordPress was the first place I observed it, it probably originated elsewhere).

只是把它扔在那里,但 WordPress 有这种东西的结构(至少 WordPress 是我观察到它的第一个地方,它可能起源于其他地方)。

It allows limitless keys, and is faster to search than using a JSON blob, but not as fast as some of the NoSQL solutions.

它允许无限的键,搜索速度比使用 JSON blob 快,但不如某些 NoSQL 解决方案快。

uid   |   meta_key    |   meta_val
----------------------------------
1         name            Frank
1         age             12
2         name            Jeremiah
3         fav_food        pizza
.................

EDIT

编辑

For storing history/multiple keys

用于存储历史记录/多个密钥

uid   | meta_id    |   meta_key    |   meta_val
----------------------------------------------------
1        1             name            Frank
1        2             name            John
1        3             age             12
2        4             name            Jeremiah
3        5             fav_food        pizza
.................

and query via something like this:

并通过这样的查询:

select meta_val from `table` where meta_key = 'name' and uid = 1 order by meta_id desc

回答by Nick Andriopoulos

the drawback of the approach is exactly what you mentioned :

该方法的缺点正是您提到的:

it makes it VERY slow to find things, since each time you need to perform a text-search on it.

它使查找内容变得非常缓慢,因为每次您都需要对其进行文本搜索。

value per column instead matches the whole string.

每列的值改为匹配整个字符串。

Your approach (JSON based data) is fine for data you don't need to search by, and just need to display along with your normal data.

您的方法(基于 JSON 的数据)适用于您不需要搜索的数据,只需与普通数据一起显示即可。

Edit:Just to clarify, the above goes for classic relational databases. NoSQL use JSON internally, and are probably a better option if that is the desired behavior.

编辑:只是为了澄清,以上适用于经典关系数据库。NoSQL 在内部使用 JSON,如果这是所需的行为,它可能是更好的选择。

回答by Girish

Basically, the first model you are using is called as document-based storage. You should have a look at popular NoSQL document-based database like MongoDB and CouchDB. Basically, in document based db's, you store data in json files and then you can query on these json files.

基本上,您使用的第一个模型称为基于文档的存储。您应该看看流行的基于 NoSQL 文档的数据库,如 MongoDB 和 CouchDB。基本上,在基于文档的数据库中,您将数据存储在 json 文件中,然后您可以查询这些 json 文件。

The Second model is the popular relational database structure.

第二种模型是流行的关系数据库结构。

If you want to use relational database like MySql then i would suggest you to only use second model. There is no point in using MySql and storing data as in the first model.

如果你想使用像 MySql 这样的关系数据库,那么我建议你只使用第二种模型。在第一个模型中使用 MySql 和存储数据是没有意义的

To answer your second question, there is no way to query name like 'foo' if you use first model.

要回答您的第二个问题,如果您使用 first model则无法查询像 'foo' 这样的名称

回答by Bruno

It seems that you're mainly hesitating whether to use a relational model or not.

看来您主要是在犹豫是否使用关系模型。

As it stands, your example would fit a relational model reasonably well, but the problem may come of course when you need to make this model evolve.

就目前而言,您的示例相当适合关系模型,但是当您需要使此模型发展时,问题当然可能会出现。

If you only have one (or a few pre-determined) levels of attributes for your main entity (user), you could still use an Entity Attribute Value (EAV) model in a relational database. (This also has its pros and cons.)

如果您的主要实体(用户)只有一个(或几个预先确定的)属性级别,您仍然可以在关系数据库中使用实体属性值 (EAV) 模型。(这也有利有弊。)

If you anticipate that you'll get less structured values that you'll want to search using your application, MySQL might not be the best choice here.

如果您预计使用应用程序搜索的结构化值较少,那么 MySQL 可能不是这里的最佳选择。

If you were using PostgreSQL, you could potentially get the best of both worlds. (This reallydepends on the actual structure of the data here... MySQL isn't necessarily the wrong choice either, and the NoSQL options can be of interest, I'm just suggesting alternatives.)

如果您使用的是 PostgreSQL,您可能会两全其美。(这真的取决于这里数据的实际结构...... MySQL 也不一定是错误的选择,NoSQL 选项可能很有趣,我只是建议替代方案。)

Indeed, PostgreSQL can build index on (immutable) functions (which MySQL can't as far as I know) and in recent versions, you could use PLV8 on the JSON data directlyto build indexes on specific JSON elements of interest, which would improve the speed of your queries when searching for that data.

事实上,PostgreSQL 可以在(不可变的)函数上建立索引(据我所知,MySQL 不能),在最近的版本中,你可以直接在 JSON 数据上使用 PLV8来建立感兴趣的特定 JSON 元素的索引,这将改进搜索该数据时的查询速度。

EDIT:

编辑:

Since there won't be too many columns on which I need to perform search, is it wise to use both the models? Key-per-column for the data I need to search and JSON for others (in the same MySQL database)?

由于我需要执行搜索的列不会太多,因此使用这两种模型是否明智?我需要搜索的数据的每列键和其他人的 JSON(在同一个 MySQL 数据库中)?

Mixing the two models isn't necessarily wrong (assuming the extra space is negligible), but it may cause problems if you don't make sure the two data sets are kept in sync: your application must never change one without also updating the other.

混合两个模型不一定是错误的(假设额外的空间可以忽略不计),但是如果您不确保两个数据集保持同步,则可能会导致问题:您的应用程序绝不能在不更新另一个的情况下更改其中一个.

A good way to achieve this would be to have a trigger perform the automatic update, by running a stored procedure within the database server whenever an update or insert is made. As far as I'm aware, the MySQL stored procedure language probably lack support for any sort of JSON processing. Again PostgreSQL with PLV8 support (and possibly other RDBMS with more flexible stored procedure languages) should be more useful (updating your relational column automatically using a trigger is quite similar to updating an index in the same way).

实现此目的的一个好方法是让触发器执行自动更新,方法是在进行更新或插入时在数据库服务器中运行存储过程。据我所知,MySQL 存储过程语言可能不支持任何类型的 JSON 处理。同样,支持 PLV8 的 PostgreSQL(可能还有其他具有更灵活存储过程语言的 RDBMS)应该更有用(使用触发器自动更新关系列与以相同方式更新索引非常相似)。

回答by Ravindra

some time joins on the table will be an overhead. lets say for OLAP. if i have two tables one is ORDERS table and other one is ORDER_DETAILS. For getting all the order details we have to join two tables this will make the query slower when no of rows in the tables increase lets say in millions or so.. left/right join is too slower than inner join. I Think if we add JSON string/Object in the respective ORDERS entry JOIN will be avoided. add report generation will be faster...

有时在表上连接将是开销。让我们说 OLAP。如果我有两张表,一张是 ORDERS 表,另一张是 ORDER_DETAILS。为了获取所有订单详细信息,我们必须连接两个表,这将使查询变慢,当表中的行数增加时,可以说是数百万左右......左/右联接比内部联接太慢。我认为如果我们在相应的 ORDERS 条目中添加 JSON 字符串/对象,将避免 JOIN。添加报告生成会更快...

回答by Ahmedfraije Aa

short answer you have to mix between them , use json for data that you are not going to make relations with them like contact data , address , products variabls

简短的回答你必须在它们之间混合,使用 json 来处理你不会与它们建立关系的数据,比如联系数据、地址、产品变量。

回答by Pants

As others have pointed out queries will be slower. I'd suggest to add at least an '_ID' column to query by that instead.

正如其他人指出的那样,查询会变慢。我建议至少添加一个“_ID”列来查询。

回答by Chris L

You are trying to fit a non-relational model into a relational database, I think you would be better served using a NoSQL database such as MongoDB. There is no predefined schema which fits in with your requirement of having no limitation to the number of fields (see the typical MongoDB collection example). Check out the MongoDB documentationto get an idea of how you'd query your documents, e.g.

您正在尝试将非关系模型放入关系数据库中,我认为使用 NoSQL 数据库(例如MongoDB )会更好地为您服务。没有符合您对字段数量没有限制的要求的预定义架构(请参阅典型的 MongoDB 集合示例)。查看 MongoDB文档以了解如何查询文档,例如

db.mycollection.find(
    {
      name: 'sann'
    }
)