SQL 动态数据库架构
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/66385/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dynamic Database Schema
提问by Fake Jim
What is a recommended architecture for providing storage for a dynamic logical database schema?
为动态逻辑数据库模式提供存储的推荐架构是什么?
To clarify: Where a system is required to provide storage for a model whose schema may be extended or altered by its users once in production, what are some good technologies, database models or storage engines that will allow this?
澄清:如果系统需要为模型提供存储,该模型的模式可能会在生产中被用户扩展或更改,有哪些好的技术、数据库模型或存储引擎可以实现这一点?
A few possibilities to illustrate:
几种可能性来说明:
- Creating/altering database objects via dynamically generated DML
- Creating tables with large numbers of sparse physical columns and using only those required for the 'overlaid' logical schema
- Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity
- Using a BigTable/SimpleDB PropertyBag type system
- 通过动态生成的 DML 创建/更改数据库对象
- 创建具有大量稀疏物理列的表并仅使用“重叠”逻辑架构所需的表
- 创建一个“长而窄”的表,将动态列值存储为行,然后需要对其进行透视以创建一个“短、宽”行集,其中包含特定实体的所有值
- 使用 BigTable/SimpleDB PropertyBag 类型系统
Any answers based on real world experience would be greatly appreciated
任何基于现实世界经验的答案将不胜感激
回答by Matt Rogish
What you are proposing is not new. Plenty of people have tried it... most have found that they chase "infinite" flexibility and instead end up with much, much less than that. It's the "roach motel" of database designs -- data goes in, but it's almost impossible to get it out. Try and conceptualize writing the code for ANY sort of constraint and you'll see what I mean.
你的提议并不新鲜。很多人都尝试过……大多数人发现他们追求“无限”的灵活性,结果却比这少得多。它是数据库设计的“蟑螂汽车旅馆”——数据进入,但几乎不可能将其取出。尝试并概念化为任何类型的约束编写代码,你就会明白我的意思。
The end result typically is a system that is MUCH more difficult to debug, maintain, and full of data consistency problems. This is not alwaysthe case, but more often than not, that is how it ends up. Mostly because the programmer(s) don't see this train wreck coming and fail to defensively code against it. Also, often ends up the case that the "infinite" flexibility really isn't that necessary; it's a very bad "smell" when the dev team gets a spec that says "Gosh I have no clue what sort of data they are going to put here, so let 'em put WHATEVER"... and the end users are just fine having pre-defined attribute types that they can use (code up a generic phone #, and let them create any # of them -- this is trivial in a nicely normalized system and maintains flexibility and integrity!)
最终结果通常是一个更难调试、维护和充满数据一致性问题的系统。情况并非总是如此,但通常情况下,这就是结果。主要是因为程序员没有看到这列火车残骸的到来,也没有防御性地针对它编写代码。此外,通常最终的情况是“无限”的灵活性确实没有那么必要;这是一种非常糟糕的“气味”,当开发团队得到一个规范说“天哪,我不知道他们要放什么样的数据,所以让他们放什么”......最终用户就好了具有他们可以使用的预定义属性类型(编码通用电话号码,
If you have a very good development team and are intimately awareof the problems you'll have to overcome with this design, you can successfully code up a well designed, not terribly buggy system. Most of the time.
如果您有一个非常优秀的开发团队,并且非常清楚您必须通过这种设计克服的问题,那么您就可以成功地编写出一个设计良好、没有严重错误的系统。大多数时候。
Why start out with the odds stacked so much against you, though?
不过,为什么一开始就对你不利呢?
Don't believe me? Google "One True Lookup Table" or "single table design". Some good results: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
不相信我?谷歌“一个真正的查找表”或“单表设计”。一些不错的结果:http: //asktom.oracle.com/pls/asktom/f?p=100:11:0 ::::P11_QUESTION_ID: 10678084117056
http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3
http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3
http://www.dbazine.com/ofinterest/oi-articles/celko22
http://www.dbazine.com/ofinterest/oi-articles/celko22
http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2
http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2
回答by Bloodhound
A strongly typed xml field in MSSQL has worked for us.
MSSQL 中的强类型 xml 字段对我们有用。
回答by Josh Yeager
Like some others have said, don't do this unless you have no other choice. One case where this is required is if you are selling an off-the-shelf product that must allow users to record custom data. My company's product falls into this category.
就像其他人所说的那样,除非别无选择,否则不要这样做。需要这样做的一种情况是,如果您销售的是必须允许用户记录自定义数据的现成产品。我公司的产品属于这一类。
If you do need to allow your customers to do this, here are a few tips:
- Create a robustadministrative tool to perform the schema changes, and do not allow these changes to be made any other way.
- Make it an administrative feature; don't allow normal users to access it.
- Log every detail about every schema change. This will help you debug problems, and it will also give you CYA data if a customer does something stupid.
如果您确实需要允许您的客户这样做,这里有一些提示:
- 创建一个强大的管理工具来执行架构更改,并且不允许以任何其他方式进行这些更改。
- 使其成为管理功能;不允许普通用户访问。
- 记录有关每个架构更改的每个细节。这将帮助您调试问题,并且如果客户做了一些愚蠢的事情,它还将为您提供 CYA 数据。
If you can do those things successfully (especially the first one), then any of the architectures you mentioned will work. My preference is to dynamically change the database objects, because that allows you to take advantage of your DBMS's query features when you access the data stored in the custom fields. The other three options require you load large chunks of data and then do most of your data processing in code.
如果你能成功地完成这些事情(尤其是第一个),那么你提到的任何架构都可以工作。我的偏好是动态更改数据库对象,因为这允许您在访问存储在自定义字段中的数据时利用 DBMS 的查询功能。其他三个选项要求您加载大量数据,然后在代码中完成大部分数据处理。
回答by clyfe
I have a similar requirement and decided to use the schema-less MongoDB.
我有一个类似的要求,并决定使用无模式的MongoDB。
MongoDB (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. (Wikipedia)
MongoDB(来自“humongous”)是一种用 C++ 编程语言编写的开源、可扩展、高性能、无模式、面向文档的数据库。(维基百科)
Highlights:
强调:
- has rich query functionality (maybe the closest to SQL DBs)
- production ready (foursquare, sourceforge use it)
- 具有丰富的查询功能(可能是最接近 SQL DB 的)
- 生产就绪(foursquare,sourceforge 使用它)
Lowdarks (stuff you need to understand, so you can use mongo correctly):
Lowdarks(您需要了解的内容,以便您可以正确使用 mongo):
- no transactions (actually it has transactions but only on atomic operations)
- this stuff here: http://ethangunderson.com/blog/two-reasons-to-not-use-mongodb/
- durability .. mostly ACID related stuff
- 没有交易(实际上它有交易但仅限于原子操作)
- 这里的东西:http: //etangunderson.com/blog/two-reasons-to-not-use-mongodb/
- 耐用性 .. 主要是酸相关的东西
回答by Thevs
I did it ones in a real project:
我在一个真实的项目中做到了:
The database consisted of one table with one field which was an array of 50. It had a 'word' index set on it. All the data was typeless so the 'word index' worked as expected. Numeric fields were represented as characters and the actual sorting had been done at client side. (It still possible to have several array fields for each data type if needed).
数据库由一个表和一个字段组成,该字段是一个 50 的数组。它上面设置了一个“单词”索引。所有数据都是无类型的,因此“单词索引”按预期工作。数字字段表示为字符,实际排序已在客户端完成。(如果需要,每个数据类型仍然可以有多个数组字段)。
The logical data schema for logical tables was held within the same database with different table row 'type' (the first array element). It also supported simple versioning in copy-on-write style using same 'type' field.
逻辑表的逻辑数据模式保存在具有不同表行“类型”(第一个数组元素)的同一数据库中。它还支持使用相同“类型”字段的写时复制样式的简单版本控制。
Advantages:
好处:
- You can rearrange and add/delete your columns dynamically, no need for dump/reload of database. Any new column data may be set to initial value (virtually) in zero time.
- Fragmentation is minimal, since all records and tables are same size, sometimes it gives better performance.
- All table schema is virtual. Any logical schema stucture is possible (even recursive, or object-oriented).
- It is good for "write-once, read-mostly, no-delete/mark-as-deleted" data (most Web apps actually are like that).
- 您可以动态重新排列和添加/删除您的列,无需转储/重新加载数据库。任何新的列数据都可以在零时间内设置为初始值(实际上)。
- 碎片是最小的,因为所有记录和表的大小相同,有时它会提供更好的性能。
- 所有表模式都是虚拟的。任何逻辑模式结构都是可能的(甚至是递归的或面向对象的)。
- 它适用于“一次写入、主要读取、不删除/标记为已删除”的数据(大多数 Web 应用程序实际上都是这样)。
Disadvantages:
缺点:
- Indexing only by full words, no abbreviation,
- Complex queries are possible, but with slight performance degradation.
- Depends on whether your preferred database system supports arrays and word indexes (it was inplemented in PROGRESS RDBMS).
- Relational model is only in programmer's mind (i.e. only at run-time).
- 仅按完整词索引,没有缩写,
- 复杂的查询是可能的,但性能略有下降。
- 取决于您首选的数据库系统是否支持数组和单词索引(它已在 PROGRESS RDBMS 中实现)。
- 关系模型只存在于程序员的脑海中(即仅在运行时)。
And now I'm thinking the next step could be - to implement such a database on the file system level. That might be relatively easy.
现在我想下一步可能是 - 在文件系统级别实现这样的数据库。那可能相对容易。
回答by Sklivvz
The whole point of having a relational DB is keeping your data safe and consistent. The moment you allow users to alter the schema, there goes your data integrity...
拥有关系数据库的全部意义在于保持您的数据安全和一致。当您允许用户更改架构时,您的数据完整性就会...
If your need is to store heterogeneous data, for example like a CMS scenario, I would suggest storing XML validated by an XSD in a row. Of course you lose performance and easy search capabilities, but it's a good trade off IMHO.
如果您需要存储异构数据,例如像 CMS 场景,我建议将 XSD 验证的 XML 存储在一行中。当然,您会失去性能和简单的搜索功能,但恕我直言,这是一个很好的权衡。
Since it's 2016, forget XML! Use JSON to store the non-relational data bag, with an appropriately typed column as backend. You shouldn't normally need to query by value inside the bag, which will be slow even though many contemporary SQL databases understand JSON natively.
因为是 2016 年,忘记 XML!使用 JSON 存储非关系数据包,并使用适当类型的列作为后端。您通常不需要按bag 内的值进行查询,即使许多现代 SQL 数据库本机理解 JSON,这也会很慢。
回答by Daniel Spiewak
Sounds to me like what you really want is some sort of "meta-schema", a database schema which is capable of describing a flexible schema for storing the actual data. Dynamic schema changes are touchy and not something you want to mess with, especially not if users are allowed to make the change.
在我看来,您真正想要的是某种“元模式”,一种能够描述用于存储实际数据的灵活模式的数据库模式。动态架构更改是敏感的,不是您想要弄乱的,尤其是在允许用户进行更改的情况下。
You're not going to find a database which is more suited to this task than any other, so your best bet is just to select one based on other criteria. For example, what platform are you using to host the DB? What language is the app written in? etc
您不会找到比任何其他数据库都更适合此任务的数据库,因此最好的办法就是根据其他标准选择一个。例如,您使用什么平台来托管数据库?应用程序是用什么语言编写的?等等
To clarify what I mean by "meta-schema":
澄清我所说的“元模式”的意思:
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
key VARCHAR(255),
data TEXT,
PRIMARY KEY (id)
);
This is a very simple example, you would likely have something more specific to your needs (and hopefully a little easier to work with), but it does serve to illustrate my point. You should consider the database schema itself to be immutable at the application level; any structural changes should be reflected in the data (that-is, the instantiation of that schema).
这是一个非常简单的示例,您可能会有更具体的需求(希望更容易使用),但它确实有助于说明我的观点。您应该考虑数据库模式本身在应用程序级别是不可变的;任何结构更改都应反映在数据中(即该模式的实例化)。
回答by AJ.
Create 2 databases
创建2个数据库
- DB1 contains static tables, and represents the "real" state of the data.
- DB2 is free for users to do with as they wish - they (or you) will have to write code to populate their odd-shaped tables from DB1.
- DB1 包含静态表,代表数据的“真实”状态。
- DB2 对用户来说是免费的——他们(或您)将不得不编写代码来从 DB1 填充他们的奇形表。
回答by AJ.
I know that models indicated in the question are used in production systems all over. A rather large one is in use at a large university/teaching institution that I work for. They specifically use the long narrow table approach to map data gathered by many varied data acquisition systems.
我知道问题中指出的模型在整个生产系统中都使用。我工作的一所大型大学/教学机构正在使用相当大的一个。他们专门使用狭长表格方法来绘制由许多不同数据采集系统收集的数据。
Also, Google recently released their internal data sharing protocol, protocol buffer, as open source via their code site. A database system modeled on this approach would be quite interesting.
此外,谷歌最近通过他们的代码站点发布了他们的内部数据共享协议协议缓冲区作为开源。以这种方法为模型的数据库系统将非常有趣。
Check the following:
检查以下内容:
回答by kamal
EAV approach i believe is the best approach, but comes with a heavy cost
我认为 EAV 方法是最好的方法,但成本很高