我什么时候可以在 SQL 表中保存 JSON 或 XML 数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43494824/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:12:40  来源:igfitidea点击:

When can I save JSON or XML data in an SQL Table

sqljsonxmlnormalizationstructured-data

提问by Shnugo

When using SQLor MySQL(or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes...

当使用SQLMySQL(或任何关系数据库)时 - 我知道将数据保存在常规列中更适合索引和其他目的......

The thing is loading and saving JSONdata is sometimes a lot more simple - and makes the development easier.

事情是加载和保存JSON数据有时要简单得多 - 并使开发更容易。

Are there any "golden rules" for saving raw JSONdata in the DB?

JSON在数据库中保存原始数据是否有任何“黄金法则” ?

Is it absolutely a bad practice to do so?

这样做绝对是一种不好的做法吗?

回答by Shnugo

The main questions are

主要问题是

  • What are you going to do with this data? and
  • How are you filtering/sorting/joining/manipulating this data?
  • 你打算用这些数据做什么?和
  • 你是如何过滤/排序/加入/操作这些数据的?

JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer your JSON data into normal tablesand re-create the JSON when you need it.

JSON(如 XML)非常适合数据交换、小型存储和通用定义的结构,但它不能参与您在 RDBMS 中运行的典型操作。在大多数情况下,最好将 JSON 数据传输到普通表中,并在需要时重新创建 JSON。

XML / JSON and 1.NF

XML / JSON 和1.NF

The first rule of normalisation dictates, never to store more than one bit of information into one column. You see a column "PersonName" with a value like "Mickey Mouse"? You point to this and cry: Change that immediately!

规范化的第一条规则规定,永远不要将多于一位的信息存储到一列中。您看到“PersonName”列的值类似于“Mickey Mouse”吗?你指着这个哭:立即改变!

What about XML or JSON? Are these types breaking 1.NF? Well, yes and no... 

XML 或 JSON 怎么样?这些类型是否破坏了 1.NF?嗯,是和不是...... 

It is perfectly okay to store a complete structure as one bit of informationif it is one bit of informationactually. You get a SOAP response and want to store it because you might need this for future reference (but you will not use this data for your own processes)? Just store it as is!

如果它实际上是一位信息,那么将一个完整的结构存储为一位信息是完全可以的。您收到一个 SOAP 响应并想要存储它,因为您可能需要它以供将来参考(但您不会将这些数据用于您自己的流程)?只需按原样存储它!

Now imagine a complex structure (XML or JSON) representing a person(with its address, further details...). Now you put this into one column as PersonInCharge. Is this wrong? Shouldn't this rather live in properly designed related tables with a foreign key reference instead of the XML/JSON? Especially if the same person might occur in many different rows it is definitely wrong to use an XML/JSON approach.

现在想象一个复杂的结构(XML 或 JSON)代表一个人(包括他的地址、更多细节......)。现在你把它作为PersonInCharge. 这是错误的吗?这不应该存在于正确设计的相关表中,使用外键引用而不是 XML/JSON?特别是如果同一个人可能出现在许多不同的行中,使用 XML/JSON 方法绝对是错误的。

But now imagine the need to store historical data. You want to persistthe person's data for a given moment in time. Some days later the person tells you a new address? No problem! The old address lives in an XML/JSON if you ever need it...

但是现在想象一下需要存储历史数据。您希望在给定的时间保留此人的数据。几天后,这个人告诉你一个新地址?没问题!如果您需要,旧地址位于 XML/JSON 中...

Conclusion:If you store the data just to keep it, it's okay. If this data is a uniqueportion, it's okay...
But if you need the internal partsregularly or if this would mean redundant duplicate storage it's not okay...

结论:如果您存储数据只是为了保留它,那没关系。如果这个数据是一个独特的部分,那没关系......
但是如果你经常需要内部部件或者如果这意味着冗余的重复存储那就不行......

Physical storage

物理存储

The following is for SQL Server and might be different on other RDBMs.

以下内容适用于 SQL Server,在其他 RDBM 上可能有所不同。

XML is not stored as the text you see, but as a hierarchy tree. Querying this is astonishingly well performing! This structure is not parsed on string level!
JSON in SQL Server (2016+) lives in a string and must be parsed. There is no real native JSON type (like there is a native XML type). This might come later, but for now I'd assume, that JSON will not be as performant as XML on SQL Server (see section UPDATE 2). Any need to read a value out of JSON will need a hell of lot of hidden string method calls...

XML 不是存储为您看到的文本,而是存储为层次结构树。查询这一点的表现令人惊讶!这个结构不是在字符串级别解析的!
SQL Server (2016+) 中的 JSON 存在于字符串中并且必须被解析。没有真正的原生 JSON 类型(就像有原生 XML 类型一样)。这可能会稍后出现,但现在我假设,JSON 在 SQL Server 上的性能不如 XML(请参阅更新 2部分)。任何需要从 JSON 中读取值都需要大量隐藏的字符串方法调用......

What does this mean for you?

这对你意味着什么?

your lovable DB artist:-D knows, that storing JSON as is, is against common principles of RDBMs. He knows,

可爱的 DB 艺术家:-D 知道,按原样存储JSON违反了 RDBM 的通用原则。他知道,

  • that a JSON is quite probably breaking 1.NF
  • that a JSON might change in time (same column, differing content).
  • that a JSON is not easy to read, and it is very hard to filter/search/join or sort by it.
  • that such operations will shift quite some extra load onto poor little DB server
  • JSON 很可能会破坏 1.NF
  • JSON 可能会随时间变化(同一列,不同的内容)。
  • JSON 不容易阅读,并且很难过滤/搜索/加入或排序。
  • 这样的操作会将相当多的额外负载转移到可怜的小型数据库服务器上

There are some workarounds (depending on the RDBMS you are using), but most of them don't work the way you'd like it...

有一些变通方法(取决于您使用的 RDBMS),但其中大部分都无法按照您希望的方式工作...

The answer to your question in short

简短回答您的问题

YES

是的

  • If you do not want to use data, which is stored withinyour JSONfor expensive operations (filter/join/sort).
    You can store this just as any other exists onlycontent. We are storing many pictures as BLOBs, but we would not try to filter for all images with a flower...
  • If you do not bother at all what's inside (just store it and read it as one bit of information)
  • If the structures are variable, which would make it harder to create physical tables then to work with JSON data.
  • If the structure is deeply nested, that the storage in physical tables is to much overhead
  • 如果您不想使用数据,这些数据存储您的 JSON 中以进行昂贵的操作(过滤器/连接/排序)。
    您可以像任何其他仅存在的内容一样存储它。我们将许多图片存储为 BLOB,但我们不会尝试过滤所有带有花的图像...
  • 如果您根本不关心里面的内容(只需将其存储并阅读为一点信息)
  • 如果结构是可变的,这将使创建物理表然后使用 JSON 数据变得更加困难。
  • 如果结构嵌套很深,那么物理表中的存储开销很大

NO

  • If you want to use the internal data like you'd use a relational table's data (filter, indexes, joins...)
  • If you would store duplicates (create redundancy)
  • In general: If you face performance problems (for sure you will face them in many typical scenarios!)
  • 如果您想像使用关系表的数据一样使用内部数据(过滤器、索引、连接...)
  • 如果您要存储重复项(创建冗余)
  • 一般而言:如果您遇到性能问题(在许多典型场景中您肯定会遇到它们!)

You might start with the JSON within a string column or as BLOB and change this to physical tables when you need it. My magic crystal ball tells me, this might be tomorrow :-D

您可以从字符串列中的 JSON 或 BLOB 开始,并在需要时将其更改为物理表。我的魔法水晶球告诉我,这可能是明天 :-D

UPDATE

更新

Find some ideas about performance and disc space here: https://stackoverflow.com/a/47408528/5089204

在此处查找有关性能和磁盘空间的一些想法:https: //stackoverflow.com/a/47408528/5089204

UPDATE 2: More about performance...

更新 2:有关性能的更多信息...

The following addresses JSON and XML support in SQL-Server 2016

以下解决了 SQL-Server 2016 中的 JSON 和 XML 支持

User @mike123 pointed to an article on an official microsoft blogwhich seems to proof in an experiment, that querying a JSON is 10 x fasterthen querying an XMLin SQL-Server.

用户@mike123 指出微软官方博客上一篇文章似乎在实验中证明,查询 JSON比在 SQL-Server 中查询 XML快 10 倍

Some thoughts about that:

对此的一些想法:

Some cross-checks with the "experiment":

与“实验”的一些交叉检查:

  • the "experiment" measures a lot, but not the performance of XML vs. JSON. Doing the same action agaist the same (unchanged) string repeatedly is not a realistic scenario
  • The tested examples are far to simple for a general statement!
  • The value read is always the same and not even used. The optimizer will see this...
  • Not a single word about the mighty XQuerysupport! Find a product with a given ID within an array? JSON needs to read the whole lot and use a filter afterwards using WHERE, while XMLwould allow an internal XQuery predicate. Not to speak about FLWOR...
  • the "experiments" code as ison my system brings up: JSON seems to be 3x faster (but not 10x).
  • Adding /text()to the XPathreduces this to less than 2x. In the related article user "Mister Magoo" pointed this out already, but the click-baittitle is still unchanged...
  • With such an easy JSON as given in the "experiment" the fastest pure T-SQL approach was a combination of SUBSTRINGand CHARINDEX:-D
  • “实验”的措施很多,但不是XML与JSON的性能。对相同(未更改)的字符串重复执行相同的操作是不现实的场景
  • 经过测试的示例对于一般陈述来说非常简单
  • 读取的值始终相同,甚至未被使用。优化器会看到这个...
  • 对强大的XQuery支持一言不发!在数组中查找具有给定 ID 的产品?JSON 需要读取全部内容并在使用之后使用过滤器WHERE,而XML将允许内部XQuery predicate. 不谈FLWOR...
  • 在“实验”的代码作为是我的系统上带来了:JSON似乎更快(但不是10倍)为3倍。
  • 添加/text()XPath减少到小于 2x。在相关文章中用户“Mister Magoo”已经指出了这一点,但点击诱饵标题仍然没有改变......
  • 用如在“实验”给出了这样一个简单的JSON最快纯T-SQL方法是的组合SUBSTRINGCHARINDEX:-D

The following code will show a more realistic experiment

下面的代码将展示一个更真实的实验

  • Using a JSON and an identical XML with more than one Product(a JSON array vs. sibling nodes)
  • JSON and XML are slightly changing (10000 running numbers) and inserted into tables.
  • There is an initial call agaist both tables to avoid first-call-bias
  • All 10000 entries are read and the values retrieved are inserted to another table.
  • Using GO 10will run through this block ten times to avoid first-call-bias
  • 使用 JSON 和具有多个相同的 XML Product(JSON 数组与同级节点)
  • JSON 和 XML 略有变化(10000 个运行数字)并插入到表中。
  • 两个表都有一个初始呼叫,以避免首次呼叫偏差
  • 读取所有 10000 个条目并将检索到的值插入到另一个表中。
  • 使用GO 10将运行此块十次以避免首次调用偏差

The final result shows clearly, that JSON is slower than XML(not that much, about 1.5x on a still very simple example).

最终结果清楚地表明,JSON 比 XML 慢(没有那么多,在一个仍然非常简单的例子中大约是 1.5 倍)。

The final statement:

最后声明:

  • With an overly simplified example under undue circumstances JSON can be faster than XML
  • Dealing with JSON is pure string action, while XML is parsed and transformed. This is rather expensive in the first action, but will speed up everything, once this is done.
  • JSON might be better in a one-timeaction (avoids the overhead of creating an internal hierarchical representation of an XML)
  • With a still very simple but more realistic example XML will be faster in simple reading
  • Whenever there is any need to read a specific element out of an array, to filter all entries where a given ProductID is included in the array, or to navigate up and down the path, JSON cannot hold up. It must be parsed out of a string completely - each time you have to grab into it...
  • 在不适当的情况下使用过于简化的示例 JSON 可能比 XML 快
  • 处理 JSON 是纯字符串操作,而 XML 是解析和转换的。这在第一个动作中相当昂贵,但一旦完成,就会加速一切。
  • JSON 在一次性操作中可能更好(避免创建 XML 的内部分层表示的开销)
  • 使用仍然非常简单但更现实的示例 XML 将在简单阅读中更快
  • 每当需要从数组中读取特定元素、过滤数组中包含给定 ProductID 的所有条目或在路径上上下导航时,JSON 都无法支持。它必须完全从字符串中解析出来 - 每次你必须抓住它......

The test code

测试代码

USE master;
GO
--create a clean database
CREATE DATABASE TestJsonXml;
GO
USE TestJsonXml;
GO
--create tables
CREATE TABLE TestTbl1(ID INT IDENTITY,SomeXml XML);
CREATE TABLE TestTbl2(ID INT IDENTITY,SomeJson NVARCHAR(MAX));
CREATE TABLE Target1(SomeString NVARCHAR(MAX));
CREATE TABLE Target2(SomeString NVARCHAR(MAX));
CREATE TABLE Times(Test VARCHAR(10),Diff INT)
GO
--insert 10000 XMLs into TestTbl1
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))*2 AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl1(SomeXml)
SELECT 
N'<Root>
    <Products>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Road Bike</ProductName>
    </ProductDescription>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Cross Bike</ProductName>
    </ProductDescription>
    </Products>
</Root>'
FROM Tally;

--insert 10000 JSONs into TestTbl2
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl2(SomeJson)
SELECT 
N'{
    "Root": {
        "Products": {
            "ProductDescription": [
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr AS NVARCHAR(10)) + '",
                    "ProductName": "Road Bike"
                },
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '",
                    "ProductName": "Cross Bike"
                }
            ]
        }
    }
}'
FROM Tally;
GO

--Do some initial action to avoid first-call-bias
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/Features/Maintenance/text())[1]', 'nvarchar(4000)')
FROM TestTbl1;
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[0].Features.Maintenance')
FROM TestTbl2;
GO

--Start the test
DECLARE @StartDt DATETIME2(7), @EndXml DATETIME2(7), @EndJson DATETIME2(7);

--Read all ProductNames of the second product and insert them to Target1
SET @StartDt = SYSDATETIME();
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/ProductName/text())[2]', 'nvarchar(4000)')
FROM TestTbl1
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'xml',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

--Same with JSON into Target2
SET @StartDt = SYSDATETIME();
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[1].ProductName')
FROM TestTbl2
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'json',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

GO 10 --do the block above 10 times

--Show the result
SELECT Test,SUM(Diff) AS SumTime, COUNT(Diff) AS CountTime
FROM Times
GROUP BY Test;
GO
--clean up
USE master;
GO
DROP DATABASE TestJsonXml;
GO

The result (SQL Server 2016 Express on an Acer Aspire v17 Nitro Intel i7, 8GB Ram)

结果(Acer Aspire v17 Nitro Intel i7、8GB Ram 上的 SQL Server 2016 Express)

Test    SumTime 
------------------
json    2706    
xml     1604    

回答by Gordon Linoff

This is too long for a comment.

评论太长了。

If it were "absolutely wrong", then most databases would not support it. Okay, most databases support commas in the FROMclause and I view that as "absolutely wrong". But support for JSON is new development, not a backward-compatible "feature".

如果它是“绝对错误的”,那么大多数数据库都不会支持它。好的,大多数数据库都支持FROM子句中的逗号,我认为这是“绝对错误的”。但是对 JSON 的支持是新开发的,而不是向后兼容的“功能”。

One obvious case is when the JSON struct is simply a BLOB that is passed back to the application. Then there is no debate -- other then the overhead of storing JSON, which is unnecessarily verbose for structured data with common fields in every record.

一个明显的情况是 JSON 结构只是一个传递回应用程序的 BLOB。然后就没有争论了——除了存储 JSON 的开销,对于在每条记录中具有公共字段的结构化数据来说,这是不必要的冗长。

Another case is the "sparse" columns case. You have rows with many possible columns, but these vary from row to row.

另一种情况是“稀疏”列的情况。您的行有许多可能的列,但这些列因行而异。

Another case is when you want to store "nested" records in a record. JSON is powerful.

另一种情况是当您想在记录中存储“嵌套”记录时。JSON 很强大。

If the JSON has common fields across records that you want to query on, then you are usually better off putting these in proper database columns. However, data is complicated and there is a place for formats such as JSON.

如果 JSON 在要查询的记录中具有公共字段,那么通常最好将它们放在适当的数据库列中。但是,数据是复杂的,并且有 JSON 之类的格式。

回答by Rick James

I'll wave my magic wand. Poof! Golden Rules on use of JSON:

我会挥动我的魔杖。噗!使用 JSON 的黄金法则:

  • If MySQL does not need to look insidethe JSON, and the application simply needs a collection of stuff, then JSON is fine, possibly even better.

  • If you will be searching on data that is inside andyou have MariaDB 10.0.1 or MySQL 5.7 (with a JSON datatype and functions), then JSON mightbe practical. MariaDB 5.3's "Dynamic" columns is a variant on this.

  • If you are doing "Entity-Attribute-Value" stuff, then JSON is not good, but it is the least of several evils. http://mysql.rjweb.org/doc.php/eav

  • For searching by an indexed column, not having the value buried inside JSON is a big plus.

  • For searching by a range on an indexed column, or a FULLTEXTsearch or SPATIAL, JSON is not possible.

  • For WHERE a=1 AND b=2the "composite" index INDEX(a,b)is great; probably can't come close with JSON.

  • JSON works well with "sparse" data; INDEXing works, but not as well, with such. (I am referring to values that are 'missing' or NULL for many of the rows.)

  • JSON can give you "arrays" and "trees" without resorting to extra table(s). But dig into such arrays/trees onlyin the app, notin SQL.

  • JSON is worlds better than XML. (My opinion)

  • If you do not want to get into the JSON string except from the app, then I recommend compressing (in the client) it an storing into a BLOB. Think of it like a .jpg -- there's stuff in there, but SQL does not care.

  • 如果 MySQL 不需要查看JSON内部,而应用程序只需要一些东西的集合,那么 JSON 就可以了,甚至可能更好。

  • 如果您要搜索内部数据并且您拥有 MariaDB 10.0.1 或 MySQL 5.7(具有 JSON 数据类型和函数),那么 JSON可能是实用的。MariaDB 5.3 的“动态”列是这方面的一个变体。

  • 如果你正在做“实体-属性-值”的事情,那么 JSON 并不好,但它是几个缺点中最少的。 http://mysql.rjweb.org/doc.php/eav

  • 对于按索引列搜索,没有将值埋在 JSON 中是一个很大的优势。

  • 对于按索引列上的范围进行FULLTEXT搜索,或搜索 or SPATIAL,JSON 是不可能的。

  • 因为WHERE a=1 AND b=2“复合”指数INDEX(a,b)很棒;可能无法接近 JSON。

  • JSON 适用于“稀疏”数据;INDEXing 工作,但不是很好,与这样的。(我指的是许多行“缺失”或 NULL 的值。)

  • JSON 可以为您提供“数组”和“树”,而无需求助于额外的表。但是只能在应用程序中挖掘这样的数组/树,而不是在 SQL 中。

  • JSON 比 XML 好得多。(我的看法)

  • 如果您不想进入除应用程序之外的 JSON 字符串,那么我建议将其压缩(在客户端中)并存储到BLOB. 把它想象成一个 .jpg —— 里面有东西,但 SQL 不在乎。

State your application; maybe we can be more specific.

陈述您的申请;也许我们可以更具体。

回答by AMRESH PANDEY

New SQL Server provides functions for processing JSON text. Information formatted as JSON can be stored as text in standard SQL Server columns and SQL Server provides functions that can retrieve values from these JSON objects.

新的 SQL Server 提供了处理 JSON 文本的函数。格式化为 JSON 的信息可以作为文本存储在标准 SQL Server 列中,并且 SQL Server 提供了可以从这些 JSON 对象中检索值的函数。

    DROP TABLE IF EXISTS Person

 CREATE TABLE Person 
 ( _id int identity constraint PK_JSON_ID primary key,
 value nvarchar(max)
 CONSTRAINT [Content should be formatted as JSON]
 CHECK ( ISJSON(value)>0 )
 )

This simple structure is similar to the standard NoSQL collection that you can create in NoSQL databases (e.g. Azure DocumentDB or MongoDB) where you just have key that represents ID and value that represents JSON.

这种简单的结构类似于您可以在 NoSQL 数据库(例如 Azure DocumentDB 或 MongoDB)中创建的标准 NoSQL 集合,其中只有表示 ID 的键和表示 JSON 的值。

Note that NVARCHAR is not just a plain text. SQL Server has built-in text compressions mechanism that can transparently compress data stored on disk. Compression depends on language and can go up to 50% depending on your data (see UNICODE compression ).

请注意, NVARCHAR 不仅仅是纯文本。SQL Server 具有内置的文本压缩机制,可以透明地压缩存储在磁盘上的数据。压缩取决于语言,并且可以达到 50%,具体取决于您的数据(请参阅 UNICODE 压缩)。

The key difference between SQL server and other plain NoSQL databases is that SQL Server enables you to use hybrid data model where you can store several JSON objects in the same “collection” and combine them with regular relational columns.

SQL Server 与其他普通 NoSQL 数据库之间的主要区别在于,SQL Server 使您能够使用混合数据模型,您可以将多个 JSON 对象存储在同一个“集合”中,并将它们与常规关系列组合在一起。

As an example, imagine that we know that every person in your collection will have FirstName and LastName, and that you can store general information about the person as one JSON object, and phone numbers/email addresses as separate objects. In SQL Server 2016 we can easily create this structure without any additional syntax:

例如,假设我们知道您集合中的每个人都有 FirstName 和 LastName,并且您可以将有关此人的一般信息存储为一个 JSON 对象,并将电话号码/电子邮件地址存储为单独的对象。在 SQL Server 2016 中,我们可以轻松创建此结构,而无需任何其他语法:

DROP TABLE IF EXISTS Person

CREATE TABLE Person (

 PersonID int IDENTITY PRIMARY KEY,

 FirstName nvarchar(100) NOT NULL,

 LastName nvarchar(100) NOT NULL,

 AdditionalInfo nvarchar(max) NULL,

 PhoneNumbers nvarchar(max) NULL,

 EmailAddresses nvarchar(max) NULL
 CONSTRAINT [Email addresses must be formatted as JSON array]
 CHECK ( ISJSON(EmailAddresses)>0 )

 )

Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don't need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).

您可以在此“集合”中组织数据,而不是单个 JSON 对象。如果您不想显式检查每个 JSON 列的结构,则不需要在每一列上添加 JSON 检查约束(在本例中,我仅在 EmailAddresses 列上添加了 CHECK 约束)。

If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.

如果将此结构与标准 NoSQL 集合进行比较,您可能会注意到您可以更快地访问强类型数据(名字和姓氏)。因此,对于混合模型,此解决方案是不错的选择,您可以识别在所有对象中重复的一些信息,而其他变量信息可以存储为 JSON。这样,您就可以将灵活性和性能结合起来。

If you compare this structure with the schema of Person table AdventureWorks database, you might notice that we have removed many related tables.

如果将此结构与 Person 表 AdventureWorks 数据库的架构进行比较,您可能会注意到我们删除了许多相关表。

Beside simplicity of schema, your data access operations will be simpler compared to complex relational structure. Now you can read single table instead of joining several tables. When you need to insert new person with related information (email addresses, phone numbers) you can insert a single record in one table instead of inserting one record in AdventureWorks Person table, taking identity column to find foreign key that will be used to store phones, email addresses, etc. In addition, in this model you can easily delete single person row without cascade deletes using foreign key relationships.

除了模式简单之外,与复杂的关系结构相比,您的数据访问操作将更加简单。现在您可以读取单个表而不是连接多个表。当您需要插入具有相关信息(电子邮件地址、电话号码)的新人员时,您可以在一个表中插入一条记录,而不是在 AdventureWorks Person 表中插入一条记录,使用标识列查找将用于存储电话的外键,电子邮件地址等。此外,在此模型中,您可以使用外键关系轻松删除单人行而无需级联删除。

NoSQL databases are optimized for simple, read, insert, and delete operations – SQL Server 2016 enables you to apply the same logic in relational database.

NoSQL 数据库针对简单、读取、插入和删除操作进行了优化——SQL Server 2016 使您能够在关系数据库中应用相同的逻辑。

JSON constraints In the previous examples, we have seen how to add simple constraint that validates that text stored in the column is properly formatted. Although JSON do not have strong schema, you can also add complex constraints by combining functions that read values from JSON and standard T-SQL functions:

JSON 约束 在前面的示例中,我们已经看到如何添加简单的约束来验证列中存储的文本格式是否正确。虽然 JSON 没有强模式,但您也可以通过组合从 JSON 读取值的函数和标准 T-SQL 函数来添加复杂的约束:

ALTER TABLE Person
 ADD CONSTRAINT [Age should be number]
 CHECK ( ISNUMERIC(JSON_VALUE(value, '$.age'))>0 )

 ALTER TABLE Person
 ADD CONSTRAINT [Person should have skills]
 CHECK ( JSON_QUERY(value, '$.skills') IS NOT NULL)
First constraint will take the value of $.age property and check is this numeric value. Second constraint will try to find JSON object in $.skills property and verify that it exists. The following INSERT statements will fail due to the violation of constraints:



INSERT INTO Person(value)
 VALUES ('{"age": "not a number", "skills":[]}')

 INSERT INTO Person(value)
 VALUES ('{"age": 35}')

Note that CHECK constraints might slow down your insert/update processes so you might avoid them if you need faster write performance.

请注意 CHECK 约束可能会减慢您的插入/更新过程,因此如果您需要更快的写入性能,您可以避免它们。

Compressed JSON storage If you have large JSON text you can explicitly compress JSON text using built-in COMPRESS function. In the following example compressed JSON content is stored as binary data, and we have computed column that decompress JSON as original text using DECOMPRESS function:

压缩的 JSON 存储如果您有大型 JSON 文本,您可以使用内置的 COMPRESS 函数显式压缩 JSON 文本。在以下示例中,压缩的 JSON 内容存储为二进制数据,我们计算列使用 DECOMPRESS 函数将 JSON 解压缩为原始文本:

CREATE TABLE Person

 ( _id int identity constraint PK_JSON_ID primary key,

 data varbinary(max),

 value AS CAST(DECOMPRESS(data) AS nvarchar(max))

 )



 INSERT INTO Person(data)

 VALUES (COMPRESS(@json))

COMPRESS and DECOMPRESS functions use standard GZip compression. If your client can handle GZip compression (e.g browser that understands gzip content), you can directly return compressed content. Note that this is performance/storage trade-off. If you frequently query compressed data you mig have slower performance because text must be decompressed each time.

COMPRESS 和 DECOMPRESS 函数使用标准 GZip 压缩。如果您的客户端可以处理 GZip 压缩(例如理解 gzip 内容的浏览器),您可以直接返回压缩内容。请注意,这是性能/存储权衡。如果您经常查询压缩数据,则迁移性能会变慢,因为每次都必须解压缩文本。

Note: JSON functions are available only in SQL Server 2016+ and Azure SQL Database.

注意:JSON 函数仅在 SQL Server 2016+ 和 Azure SQL 数据库中可用。

More can be read from the source of this article

更多内容可从本文来源阅读

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/23/storing-json-in-sql-server/

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/23/storing-json-in-sql-server/

回答by piisexactly3

The "golden rule" I use, in a hand-wavey sort of way, is that if I need JSON in its raw format, it's okay to store. If I have to make a special point of parsing it, then it's not.

我使用的“黄金法则”,在某种程度上,是如果我需要原始格式的 JSON,则可以存储。如果我必须特别强调解析它,那么它不是。

For instance, if I'm creating an API that sends out raw JSON, and for whatever reason this value isn't going to change, then it's okayto store it as raw JSON. If I have to parse it, change it, update it, etc... then not so much.

例如,如果我正在创建一个发送原始 JSON 的 API,并且无论出于何种原因该值不会改变,那么可以将其存储为原始 JSON。如果我必须解析它,更改它,更新它等等......那么不是那么多。

回答by Anand

The question you have to ask is:

你要问的问题是:

Am I tied to using only this database?

我是否只能使用此数据库?

DO

  1. If you can use a different database to store JSON, use a document storage solution such as CouchDB, DynamoDB or MongoDB.
  2. Use these document storage DB's ability to index and search hierarchical data.
  3. Use a relational database for your relational data.
  4. Use a relational database for reporting, data warehousing and data mining.
  1. 如果您可以使用不同的数据库来存储 JSON,请使用文档存储解决方案,例如 CouchDB、DynamoDB 或 MongoDB。
  2. 使用这些文档存储 DB 的能力来索引和搜索分层数据。
  3. 将关系数据库用于关系数据。
  4. 使用关系数据库进行报告、数据仓库和数据挖掘。

DON'T

  1. Store JSON as string if possible.
  2. Try and come up with max length of JSON data.
  3. Use varchar to store JSON (use text/blob if you must).
  4. Try and search through stored JSON for values.
  5. Worry about escaping JSON to store as string.
  1. 如果可能,将 JSON 存储为字符串。
  2. 尝试提出 JSON 数据的最大长度。
  3. 使用 varchar 存储 JSON(如果必须,请使用 text/blob)。
  4. 尝试在存储的 JSON 中搜索值。
  5. 担心转义 JSON 以存储为字符串。

回答by Satyadev

Json's are not great in relationional db's. If you unfold the json into columns and store in a db , it's great but storing a json as a blob is next to using it as data archival system.

Json 在关系数据库中并不是很好。如果您将 json 展开成列并存储在 db 中,这很好,但是将 json 存储为 blob 紧接着将其用作数据存档系统。

There could be several reasons for not unfolding a json and storing it in a single column but the decision would have been taken as the values in that json field would not be used for any querying (or the values have been already unfolded into columns).

不展开 json 并将其存储在单个列中可能有多种原因,但会做出决定,因为该 json 字段中的值不会用于任何查询(或者值已经展开到列中)。

Also , most of the json processing if at all the field was queried would be outside the sql environment as sql is just not meant for json processing. The real question then becomes , where do i store this json, do i just let it be as flat files and when required query them via some other system (spark/hive/etc).

此外,大多数 json 处理(如果所有字段都被查询)将在 sql 环境之外,因为 sql 不适合 json 处理。那么真正的问题就变成了,我应该在哪里存储这个 json,我是否只是让它作为平面文件,并在需要时通过其他一些系统(spark/hive/etc)查询它们。

I would agree with your DB artist , don't use RDBMS for archival. There are cheaper options. Also json blobs can get huge and can start bogging down the DB disk space with time.

我会同意你的数据库艺术家,不要使用 RDBMS 进行存档。有更便宜的选择。此外,json blob 可能会变得很大,并且随着时间的推移可能会开始使 DB 磁盘空间陷入困境。

回答by Galina Alperovich

PostgreSQL has a built-in jsonand jsonbdata type

PostgreSQL 有一个内置的jsonjsonb数据类型

These are a few examples:

以下是一些示例:

CREATE TABLE orders (
 ID serial NOT NULL PRIMARY KEY,
 info json NOT NULL
);

INSERT INTO orders (info)
VALUES
 (
 '{ "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}'
 ),
 (
 '{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}'
 ),
 (
 '{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}'
 );

PostgreSQL provides two native operators ->and ->>to query JSON data.

PostgreSQL 提供了两个原生操作符->->>查询 JSON 数据。

The operator ->returns JSON object field by key.

运算符->按键返回 JSON 对象字段。

The operator ->>returns JSON object field by text.

运算符->>按文本返回 JSON 对象字段。

SELECT
 info -> 'customer' AS customer
FROM
 orders;

SELECT
 info ->> 'customer' AS customer
FROM
 orders
WHERE
 info -> 'items' ->> 'product' = 'Diaper'