将 XML 存储在数据库中是否“不好”?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4403523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:25:59  来源:igfitidea点击:

Is it "bad" to store XML in a database?

xmldatabase

提问by Ender

I have heard from several sources that storing XML in a database is "bad", but I have never seen/heard an actual explanation of why that is. Is it true? If it is true, can you explain why? Moreover, can you tell me what a "good" case for storing XML in a database is?

我从多个来源听说将 XML 存储在数据库中是“糟糕的”,但我从未见过/听说过为什么会这样的实际解释。这是真的吗?如果是真的,你能解释一下为什么吗?此外,您能告诉我将 XML 存储在数据库中的“好”案例是什么吗?

采纳答案by Randy Minder

It's not bad at all. Microsoft SQL Server has an XML data type. One use case for storing XML is a situation we found ourselves in. For each row in a particular table, we needed to store a variable number of attributes related to that row. And the number of these attributes can change over time, and with each row. We found it more efficient to store these attributes, and their values in an XML format. In the future, each time we adjust the number of attributes, we don't need to make schema changes.

这一点也不差。Microsoft SQL Server 具有 XML 数据类型。存储 XML 的一个用例是我们发现自己所处的情况。对于特定表中的每一行,我们需要存储与该行相关的可变数量的属性。并且这些属性的数量可以随着时间和每一行而变化。我们发现以 XML 格式存储这些属性及其值更有效。以后我们每次调整属性的数量,都不需要做schema的改变。

回答by Jon

There are some really stupid answers here - just because a database supportsa data type does notmean you should be using it. These things are invariably added in as features because the competition have them, not because they are the right thing to do. Global variables? Triggers? Would anyone like to defend them too just because you can use them and they're there?

这里有一些非常愚蠢的答案-只是因为数据库支持数据类型并没有你应该使用它的意思。这些东西总是作为功能添加进来的,因为竞争对手拥有它们,而不是因为它们是正确的做法。全局变量?触发器?是否有人也愿意为它们辩护,因为您可以使用它们并且它们就在那里?

If you have multiple attributes, the best way to handle them in a relational database is with a one to many relationship. Parse out your useful data from the XML overhead. You then just store the ID (primary key) of the parent record with each of the rows stored in a second table, one row per attribute. You can have any number of attributes per parent record. It's database design 101, nothing clever. Storing it as unstructured XML just to store a variable number of attributes is notthe way to go, it's a sledgehammer to crack a peanut. A one to many relationship between two tables is simpler, easier to understand, muchfaster to query, much less effort coding, and less storage (which means faster queries). Everyone wins, apart from the storage vendors.

如果您有多个属性,在关系数据库中处理它们的最佳方法是使用一对多关系。从 XML 开销中解析出有用的数据。然后,您只需将父记录的 ID(主键)与存储在第二个表中的每一行一起存储,每个属性一行。每个父记录可以有任意数量的属性。这是数据库设计 101,没什么聪明的。将它存储为非结构化的 XML 只是为了存储可变数量的属性并不是可行的方法,它是一个敲碎花生的大锤。一对多关系的两个表之间更简单,更容易理解,快的查询,工作量要少得多的编码,和更少的存储空间(这意味着更快的查询)。除了存储供应商之外,每个人都是赢家。

XML is a data transfer protocol; as GolezTrol rightly said, "It is a way to export (and import) data" - i.e.: it is simply an overhead used to facilitate the communication of the structure of the data between different systems. Once received, the tags should be stripped out and the data (and onlythe data) stored in your database engine of choice, whatever that might be. Not the XML itself. The overhead for XML is ~10x that of the data it's describing. Want to tell your boss why that 100GB of data is occupying 1TB of space on your hyper expensive SAN? Or taking all night to back up over a saturated network link? Or causing performance problems in production? If you don't parse out the data from the now pointless tags, you will just push the problem and ongoing, daily support costs onto operational support for the next ten years. Sloppy, sloppy, sloppy. This keeps vendors like EMC in business.

XML 是一种数据传输协议;正如 GolezTrol 所说,“这是一种导出(和导入)数据的方式”——即:它只是一种用于促进不同系统之间数据结构通信的开销。一旦收到,标签应该被剥离,数据(并且只有数据)存储在您选择的数据库引擎中,无论它是什么。不是 XML 本身。XML 的开销大约是它所描述数据的 10 倍。想告诉您的老板,为什么 100GB 的数据占用了您超级昂贵的 SAN 上的 1TB 空间?或者花一整夜时间通过饱和的网络链接进行备份?或者在生产中导致性能问题?如果您不从现在毫无意义的标签中解析出数据,您只会将问题和持续的日常支持成本推到未来十年的运营支持上。马虎,马虎,马虎。这使像 EMC 这样的供应商保持业务发展。

XML is metadata. Nothing clever, just a schema descriptor. Once it's transferred and parsed it's lost its usefulness and is just clutter that clogs up whatever database you use. Get rid of it, unless you're compulsively addicted to hording yesterday's pointless crappy description metadata, stored many times over. Wake up. It's typical "Emperor's New Clothes" syndrome, stopped being conned by something simple and disposable. It's only metadata and it should not be stored or worshipped, it's junk once it's parsed. And what's better? To parse it once, or to uselessly parse it everytime you need data from it? The answer's pretty darned obvious to me.

XML 是元数据。没什么聪明的,只是一个模式描述符。一旦它被传输和解析,它就失去了它的用处,并且只是堵塞了你使用的任何数据库的混乱。摆脱它,除非你强迫性地沉迷于堆积昨天毫无意义的蹩脚描述元数据,存储了很多次。醒来。这是典型的“皇帝的新衣”综合症,不再被简单和一次性的东西所欺骗。它只是元数据,不应该被存储或崇拜,一旦被解析就是垃圾。什么更好?解析它一次,还是每次需要从中获取数据时都无用地解析它?答案对我来说非常明显。

回答by Sean Vieira

Storing XML, JSON, YAML, comma-seperated lists, binary blobs, or anything else in a database is not bad... per se.

在数据库中存储 XML、JSON、YAML、逗号分隔列表、二进制 blob 或其他任何东西都不错……本身

It canindicate a lack of understanding of what a database is for (storing data that is related to other data) and conjures up visions of databases with single column tables called data1, data2, etc. ... with each table row holding a +5 MB entry of XML encoded relational data.

可以表明缺少一个什么样的数据库是(存储即与其他数据的数据)和联想到的单列的表称为数据库的愿景的理解data1data2等等...,每个表行持有+5 MB XML 编码的关系数据的条目。

On the other hand, there are many valid cases that can be made for such a structure -- rapidly changing configurations might be represented in JSON and stored in a two column table structured like this:

另一方面,对于这种结构,有许多有效的案例——快速变化的配置可能用 JSON 表示并存储在如下结构的两列表中:

dbo.good_table
ApplicationID (bigint)
Configuration (varchar(max))

The difference between the above table and a table like this:

上表和这样的表的区别:

dbo.bad_table
ApplicationID (bigint)
ApplicationMembers(xml)

Is that good_tableis enabling rapid access to a piece of data (the configuration), while the bad_tableis using the database as an ofttimes expensive (and slow) hard disk.

good_table是否能够快速访问一段数据(配置),同时bad_table将数据库用作通常昂贵(且缓慢)的硬盘。

回答by GolezTrol

XML is itself a kind storage format. It is most practically used for transportation of data, because it provides a common mechanic for structuring data. There are fixed rules for reading and writing XML that allow XML data to be read by anyone. Also validations and transformation to other output formats are relatively easy (using xslt). XML, however, is not the best way to store data in. It is time consuming to read XML files and they take up relatively much space. It is best to store your data in a structured manner in your database, and export the data from certain queries to XML if you need them in reports, on a website or to pass them to other parties.

XML 本身就是一种存储格式。它最实际地用于数据传输,因为它提供了一种用于结构化数据的通用机制。读取和写入 XML 有固定的规则,允许任何人读取 XML 数据。此外,验证和转换为其他输出格式也相对容易(使用 xslt)。然而,XML 并不是存储数据的最佳方式。读取 XML 文件很耗时,而且它们占用的空间相对较大。最好以结构化的方式将数据存储在数据库中,如果您需要在报告中、网站上或将它们传递给其他方,则将某些查询中的数据导出到 XML。

There are XML databases, but they also don't store there data in XML. They merely provide a way to save and load hierarchical data (XML is an hierarchical structure), instead of the standard table structure.

有 XML 数据库,但它们也不将数据存储在 XML 中。它们只是提供了一种保存和加载分层数据的方法(XML 是一种分层结构),而不是标准的表结构。

So it is right to say that storing XML content in a blob in a database is generally not the right way to go, but there are always exceptions ofcourse.

因此,可以说将 XML 内容存储在数据库中的 blob 中通常不是正确的方法,但当然总有例外。

XML is -in contrast with what others say here- not a way to display data. It is a way to export (and import) data. It is a logical choice for transportation of data. That is because you are totally flexible in the way that you want it to export, it can easily be transformed to other formats. Like, if you have a webshop, and you want to export prices and productinformation to other parties, you could choose XML. These other parties can write easy rules to transform this data to their needs. Neither party has to know the way there prices are stored on the other side, and neither party has to write a complex tool to parse some hard to read binary that someone else has made up.

XML - 与其他人在这里所说的相反 - 不是一种显示数据的方式。这是一种导出(和导入)数据的方式。这是数据传输的合乎逻辑的选择。那是因为您在导出方式上非常灵活,可以轻松地将其转换为其他格式。比如,如果您有一个网店,并且您想将价格和产品信息导出给其他方,您可以选择 XML。这些其他方可以编写简单的规则来将这些数据转换为他们的需要。任何一方都不必知道价格在另一方的存储方式,也无需编写复杂的工具来解析其他人编写的一些难以阅读的二进制文件。

回答by user454322

Not, it is not.

不,它不是。

Actually several databases already have data types for storing XML documents.

实际上,有几个数据库已经具有用于存储 XML 文档的数据类型。

回答by Ljdawson

I think storing a database would be bad for perhaps speed reasons (parsing etc). However a good case would be that it fits the semi-structured model there are some advantages of this listed here.

我认为由于速度原因(解析等),存储数据库可能会很糟糕。然而,一个很好的例子是它适合半结构化模型,这里列出一些优点。

回答by James Noyes

It's neither bad or good to do store XML in a DB. You just have to consider your requirements and how the data is used.

将 XML 存储在数据库中既好也好。您只需要考虑您的要求以及数据的使用方式。

If your data is machine produced and consumed, and it is only in XML to transport between apps, then a DB makes sense. At this point, you may also want to look at JSON instead of XML because it is somewhat better (IMO) at encapsulated data transport between two applications.

如果您的数据是机器生成和使用的,并且只能以 XML 格式在应用程序之间传输,那么 DB 是有意义的。此时,您可能还想查看 JSON 而不是 XML,因为它在两个应用程序之间的封装数据传输方面更好 (IMO)。

If your data is human-produced or document-centric, or may be subject to periodic schema change, or is consumed by being read as a document, then it may make more sense to keep in XML. You may also consider some sort of version control if the XML is sufficiently mission-critical or you need a record of changes.

如果您的数据是人工生成的或以文档为中心的,或者可能会受到定期模式更改的影响,或者被作为文档读取而被使用,那么保留在 XML 中可能更有意义。如果 XML 足够关键任务或者您需要更改记录,您也可以考虑某种版本控制。