database 在数据仓库(关系)中使用外键是一种好习惯吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2690818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is it good practice to have foreign keys in a datawarehouse (relationships)?
提问by Lieven Cardoen
I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.
我认为这个问题已经足够清楚了。我的数据仓库表中的某些列可能与主键有关系。但这是好的做法吗?它是非规范化的,因此不应再次删除(数据仓库中的数据)。希望问题有点清楚。
采纳答案by Dave Archer
I have no idea. But nobody is answering, so I googled and found a best practises paperwho seem to say the very helpful "it depends" :-)
我不知道。但是没有人回答,所以我用谷歌搜索并找到了一篇最佳实践论文,它似乎在说非常有帮助的“这取决于”:-)
While foreign key constraints help data integrity, they have an associated cost on all insert, update and delete statements. Give careful attention to the use of constraints in your warehouse or ODS when you wish to ensure data integrity and validation
虽然外键约束有助于数据完整性,但它们对所有插入、更新和删除语句都有相关的成本。当您希望确保数据完整性和验证时,请仔细注意仓库或 ODS 中约束的使用
回答by Damir Sudarevic
I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.
我假设您在事实表中指的是 FK。在 DW 加载期间,索引和任何外键都会被删除以加速加载——ETL 过程负责处理键。
Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.
外键约束在插入和更新期间(这是它需要检查父表中是否存在键值)以及在父表中删除主键期间“激活”。它在读取过程中不起作用。删除 DW 中的记录是(应该)是一个受控过程,它在从维度表中删除之前扫描任何现有关系。
So, most DWs do not have foreign keys implemented as constraints.
因此,大多数 DW 没有将外键实现为约束。
回答by Cade Roux
FK constraints work well in Kimball dimensional models on SQL Server.
FK 约束在 SQL Server 上的 Kimball 维度模型中运行良好。
Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).
通常情况下,你的ETL需要查找维度表(通常在业务key上,处理缓慢变化的维度)来确定维度代理id,维度代理id通常是一个身份,维度上的PK通常是维度surrogate id,它已经是一个索引(可能是聚集的)。
Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.
此时拥有 RI 并不是写入的巨大开销,因为它还可以帮助在开发过程中捕获 ETL 缺陷。此外,让事实表的 PK 是所有 FK 的组合还可以帮助捕获潜在的数据建模问题和双重加载。
It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.
如果您喜欢制作星型模型的通用扁平视图或表值函数,它实际上可以减少选择的开销。因为额外的维度内连接保证产生一行,所以优化器可以非常有效地使用这些约束来消除查找表的需要。如果没有 FK 约束,可能必须执行这些查找以消除维度不存在的事实。
回答by peterchen
The quesiton is clear, but "good practice" seems the wrong question.
问题很清楚,但“良好做法”似乎是错误的问题。
"Couldhave FK's"?
“可以有 FK 的”?
Foreign keys are a mechanism to preserve integrity constraints during database modifications.
外键是一种在数据库修改期间保持完整性约束的机制。
If your DW is read-only (accumulating data sources without writing back), there is no need for FK's.
如果您的 DW 是只读的(累积数据源而不回写),则不需要 FK。
If your DW supports writes, integrity constaints typically need to be coordinated across the participating data sources by the ETL (rather, it's Store equivalent). This process may or may not rely on FK's in the database.
如果您的 DW 支持写入,则完整性约束通常需要由 ETL 在参与的数据源之间协调(更确切地说,它与 Store 等效)。这个过程可能依赖也可能不依赖数据库中的 FK。
So the right question would be: do you needthem.
所以正确的问题是:你需要它们吗?
(The only other reason I can think of would be documentation of relationship - however, this can be done on paper / in a separate document, too.)
(我能想到的唯一其他原因是关系文档 - 但是,这也可以在纸上/在单独的文档中完成。)
回答by Bill Anton
Using FK-constraints in a DW is like wearing a bicycle helmet. If the ETL is designed correctly, you technicallydon't need them. That said, if I had a million dollars for every time I've seen bug-free ETL, I'd have zero dollars.
在 DW 中使用 FK 约束就像戴上自行车头盔。如果 ETL 设计正确,那么从技术上讲,您不需要它们。就是说,如果我每次看到无错误 ETL 时有 100 万美元,我就会有 0 美元。
Until you're at a point where FK-constraints are causing performance issues, I say leave'em. Cleaning up referential integrity problems can be much harder than adding them from the get-go ;-)
直到您处于 FK 约束导致性能问题的地步,我说离开。清理参照完整性问题比从一开始就添加它们要困难得多;-)
回答by T. W.
Yes, as a best practice, implement the FK constraints on your fact tables. In SQL Server, use NOCHECK. In ORACLE always use RELY DISABLE NOVALIDATE. This allows the warehouse or mart to know about the relationship, but not check it on INSERT, UPDATE, or DELETE operations. Star transformations, optimizations, etc. may not rely on the FK constraints to improve queries like they used to, but one never knows what BI or OLAP tools will be used on the front side or your warehouse or mart. Some of these tools can make use of knowing the relationships are defined. Plus, how many ugly looking warehouses have you seen with little or no external documentation and had to try to reverse engineer them? Defining the FKs always helps with that.
是的,作为最佳实践,在您的事实表上实施 FK 约束。在 SQL Server 中,使用 NOCHECK。在 ORACLE 中总是使用 RELY DISABLE NOVALIDATE。这允许仓库或集市了解关系,但不会在 INSERT、UPDATE 或 DELETE 操作中检查它。星型转换、优化等可能不像过去那样依赖 FK 约束来改进查询,但人们永远不知道前端或您的仓库或集市将使用哪些 BI 或 OLAP 工具。其中一些工具可以利用已定义的关系。另外,您见过多少看起来很丑的仓库,几乎没有或没有外部文档,并且不得不尝试对它们进行逆向工程?定义 FK 总是有帮助的。
As designers we NEVER seem to make our data warehouses or marts as self-documenting as we should. Defining FKs certainly helps with that. Now, having said this, if star schemas are properly designed without FKs being defined, it is easy to read and understand them anyway.
作为设计师,我们似乎从来没有让我们的数据仓库或集市像我们应该的那样自我记录。定义 FK 肯定会对此有所帮助。现在,话虽如此,如果在没有定义 FK 的情况下正确设计星型模式,无论如何都很容易阅读和理解它们。
And for ORACLE fact tables, always define a LOCAL BITMAP index on every FK to a dimension. Just do it. The indexing is actually more important than the FK being defined.
而对于 ORACLE 事实表,总是在每个 FK 上定义一个 LOCAL BITMAP 索引到一个维度。去做就对了。索引实际上比定义的 FK 更重要。
回答by nvogel
The reason for using a foreign key constraint in a data warehouse is the same as for any other database: to ensure data integrity.
在数据仓库中使用外键约束的原因与任何其他数据库相同:确保数据完整性。
It is also possible that query performance will benefit because foreign keys permit certain types of query rewrite that are not normally possible without them. Data integrity is still the main reason to use foreign keys however.
查询性能也可能会受益,因为外键允许某些类型的查询重写,没有它们通常是不可能的。然而,数据完整性仍然是使用外键的主要原因。
回答by user4125724
There is a very good reason to create FK constraints in even read-only DW/DM. Yes, they are not really required from read-only DW itself point of view, if your ETL is bullet-proof, etc., etc. But guess what - the life doesn't stop at the loading data in DW. Most of the BI analytical/reporting tools are using information about your DW relationships to automatically build their model (for example SSAS Tabular model). In my humble opinion this alone outweighs the little overhead on dropping and recreating FK constraints during ETL process.
即使在只读 DW/DM 中也有很好的理由创建 FK 约束。是的,从只读 DW 本身的角度来看,它们并不是真正需要的,如果您的 ETL 是防弹的,等等。但是你猜怎么着 - 生活不会停止在 DW 中加载数据。大多数 BI 分析/报告工具都使用有关 DW 关系的信息来自动构建其模型(例如 SSAS 表格模型)。在我看来,仅此一项就超过了在 ETL 过程中删除和重新创建 FK 约束的小开销。

