java 为什么不鼓励在休眠中使用复合键?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14112839/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 15:06:52  来源:igfitidea点击:

Why are composite keys discouraged in hibernate?

javadatabasehibernateormcomposite-key

提问by Isaac

This is from Hibernate official tutorial:

这是来自Hibernate 官方教程

There is an alternative <composite-id>declaration that allows access to legacy data with composite keys. Its use is strongly discouraged for anything else.

有一个替代<composite-id>声明允许使用复合键访问遗留数据。强烈建议不要将其用于其他任何用途。

Why are composite keys discouraged? I am considering using a 3-column table where all of the columns are foreign keys and together form a primary key that is a meaningful relationship in my model. I don't see why this is a bad idea, espicially that I will be using an index on them.

为什么不鼓励使用复合键?我正在考虑使用一个 3 列表,其中所有列都是外键,并共同形成一个主键,这在我的模型中是有意义的关系。我不明白为什么这是一个坏主意,特别是我将对它们使用索引。

What's the alternative? Create an additional automatically generated column and use it as a primary key? I still need to query my 3 columns anyways!?

什么是替代方案?创建一个额外的自动生成的列并将其用作主键?我仍然需要查询我的 3 列!?

In short, why is this statement true? and what's the better alternative?

简而言之,为什么这个说法是正确的?什么是更好的选择?

回答by JB Nizet

They discourage them for several reasons:

他们劝阻他们有几个原因:

  • they're cumbersome to use. Each time you need to reference an object (or row), for eexample in your web application, you need to pass 3 parameters instead of just one.
  • they're inefficient. Instead of simply hashing an integer, the database needs to hash a composite of 3 columns.
  • they lead to bugs: developers inevitably implement the equals and hashCode methods of the primary key class incorrectly. Or they make it mutable, and modify their value once stored in a HashSet or HashMap
  • they pollute the schema. If another table needs to reference this 3-column table, it will need to have a 3 columns instead of just one as a foreign key. Now suppose you follow the same design and make this 3-column foreign key part of the primary key of this new table, you'll quickly have a 4-column primary key, and then a 5-column PK in the next table, etc. etc., leading to duplication of data, and a dirty schema.
  • 它们使用起来很麻烦。每次您需要引用一个对象(或行)时,例如在您的 Web 应用程序中,您需要传递 3 个参数而不是一个。
  • 他们效率低下。不是简单地散列一个整数,数据库需要散列 3 列的组合。
  • 它们会导致错误:开发人员不可避免地会错误地实现主键类的 equals 和 hashCode 方法。或者他们使它可变,并在存储在 HashSet 或 HashMap 后修改它们的值
  • 他们污染了模式。如果另一个表需要引用这个 3 列表,它将需要有 3 列而不是只有一个作为外键。现在假设您遵循相同的设计并将这个 3 列的外键部分作为这个新表的主键,您将很快拥有一个 4 列的主键,然后在下一个表中拥有一个 5 列的 PK,等等. 等,导致数据重复和脏模式。

The alternative is to have a single-column, auto-generated primary key, in addition to the other three columns. If you want to make the tuple of three columns unique, then use a unique constraint.

替代方法是除了其他三列之外,还有一个自动生成的单列主键。如果要使三列的元组唯一,请使用唯一约束。

回答by mwnsiri

Even if it is - maybe - too late to answer your question, I want here to give another point of view (more moderate I hope) on the need (Is it really an advise ?) of Hibernate to use surrogate keys.

即使 - 也许 - 回答你的问题为时已晚,我想在这里就 Hibernate 使用代理键的需要(这真的是一个建议吗?)提出另一个观点(我希望更温和)。

First of all, I want to be clear on the fact that bothsurrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have prosand cons. I am not trying to say that one key type is better than the other. I am trying to say that depending on your requirements, natural keys might be a better choice than surrogate ones and vice versa.

首先的,我想是这样的事实,明确双方的代理键(人工自动生成的)和自然键(列(S组成)与域的意思)有优点缺点。我并不是想说一种键类型比另一种更好。我想说的是,根据您的要求,自然键可能是比代理键更好的选择,反之亦然。

Myths on natural keys

关于自然键的神话

  1. Composite keys are less efficient than surrogate keys. No! It depends on the used database engine:
  2. Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduledflight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standardthen this set of data is a [good] natural key candidate.
  3. Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.
  1. 复合键的效率低于代理键。不!这取决于使用的数据库引擎:
  2. 现实生活中不存在自然键。抱歉,它们确实存在!例如,在航空业中,对于给定的预定航班(航空公司、出发日期、航班号、运营后缀),以下元组将始终是唯一的。更一般地,当一组业务数据被给定的标准保证是唯一的时,这组数据是一个 [good] 自然关键候选者。
  3. 自然键“污染模式”子表。对我来说,这与其说是一个真正的问题,不如说是一种感觉。拥有 4 列 2 字节的主键可能比单列 11 字节更有效。此外,这4列可用于直接查询子表(通过使用where子句中的4列)而无需加入父表。

Disadvantages of surrogate keys

代理键的缺点

Surrogate keys are:

代理键是:

  1. Source of performance problems:
    • They are usually implemented using auto-incremented columns which mean:
      • A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
      • If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
      • If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
  2. Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following facts:
    • You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
    • If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
    • Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
  3. They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THEproblem of surrogate keys from a database design point of view.
  4. More in Wikipedia...
  1. 性能问题的来源:
    • 它们通常使用自动递增的列实现,这意味着:
      • 每次你想获得一个新的 Id 时都要到数据库的往返(我知道这可以使用缓存或 [seq]hilo 类似的算法来改进,但这些方法仍然有自己的缺点)。
      • 如果有一天您需要将数据从一种模式移动到另一种模式(至少在我公司经常发生这种情况),那么您可能会遇到 Id 冲突问题。是的,我知道您可以使用 UUID,但这些持续需要 32 个十六进制数字!(如果您关心数据库大小,那么这可能是一个问题)。
      • 如果您对所有代理键使用一个序列,那么-当然-您最终会在数据库上发生争用。
  2. 容易出错。序列具有 max_value 限制,因此作为开发人员,您必须注意以下事实:
    • 您必须循环您的序列(当达到最大值时,它会回到 1,2,...)。
    • 如果您使用序列作为数据的排序(随着时间的推移),那么您必须处理循环的情况(Id 为 1 的列可能比 Id max-value - 1 的行新)。
    • 确保您的代码(甚至您的客户端接口不应该发生,因为它应该是内部 Id)支持用于存储序列值的 32b/64b 整数。
  3. 他们不保证非重复数据。您始终可以有 2 行具有所有相同的列值但具有不同的生成值。对我来说这是THE从一个数据库设计点代理键的问题。
  4. 更多在维基百科...

Why Hibernate prefers/needs surrogate keys ?

为什么 Hibernate 更喜欢/需要代理键?

As stated in Java Persistence with Hibernatereference:

正如Java Persistence with Hibernate参考中所述:

More experienced Hibernate users use saveOrUpdate() exclusively; it's much easier to let Hibernate decide what is new and what is old, especially in a more complex network of objects with mixed state. The only (not really serious) disadvantage of exclusive saveOrUpdate() is that it sometimes can't guess whether an instance is old or new without firing a SELECT at the database—for example, when a class is mapped with a natural composite key and no version or timestamp property.

更有经验的 Hibernate 用户只使用 saveOrUpdate();让 Hibernate 决定什么是新的和什么是旧的要容易得多,尤其是在具有混合状态的更复杂的对象网络中。独占 saveOrUpdate() 唯一(不是很严重)的缺点是它有时无法在不触发数据库的 SELECT 的情况下猜测实例是旧的还是新的——例如,当一个类被映射为一个自然复合键并且没有版本或时间戳属性。

Some manifestations of the limitation(This is how, I think, we should call it) can be found here.

可以在此处找到限制的一些表现形式(我认为,我们应该这样称呼它)。

Conclusion

结论

Please don't be too squared on your opinions. Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.

请不要太拘泥于你的意见。在相关的情况下使用自然键,并在最好使用代理键时使用代理键。

Hope that this helped someone!

希望这对某人有所帮助!

回答by Luigi R. Viggiano

I would consider the problem from a design point of view. It's not just if Hibernate considers them good or bad. The real question is: are natural keys good candidates to be good identifiers for my data?

我会从设计的角度考虑这个问题。不仅仅是 Hibernate 认为它们是好是坏。真正的问题是:自然键是否适合作为我的数据的良好标识符?

In your business model, today it can be convenient to identify a record by some of its data, but business models evolves in time. And when this happens, you'll find that your natural key doesn't fit anymore to uniquely identify your data. And with referential integrity in other tables, this will make things MUCHharder to change.

在您的业务模型中,今天可以方便地通过某些数据识别记录,但业务模型会随着时间的推移而发展。当这种情况发生时,您会发现您的自然键不再适合唯一标识您的数据。并与其他表参照完整性,这将使事情MUCH难以改变。

Having a surrogate PK is convenient because it doesn't chain how your data is identified in your storagewith your business model structure.

拥有代理 PK 很方便,因为它不会将您的数据在您的存储中的识别方式您的业​​务模型结构联系起来

Natural keys cannot be generated from a sequence, and the case of data which cannot be identified by its data is muchmore frequent. This is an evidence that natural keys differ from a storage key, and they cannot be taken as a general (and good) approach.

序列无法生成自然键,无法通过其数据识别数据的情况更为常见。这是自然键不同于存储键的证据,它们不能被视为通用(和好的)方法。

Using surrogate keys simplifies the design of the application and database. They are easier to use, are more performant, and do a perfect job.

使用代理键可以简化应用程序和数据库的设计。它们更易于使用,性能更高,并且可以完成完美的工作。

Natural keys bring only disadvantages: I cannot think of a single advantage for using natural keys.

自然键带来的只有缺点:我想不出使用自然键有什么好处。

That said, I think hibernate has no real issues with natural (composed) keys. But you'll probably find some problems (or bugs) sometimes, and issues with the documentation or trying to get help, because the hibernate community widely acknowledges the benefits of surrogate keys. So, prepare a good answer for why you did choose a composite key.

也就是说,我认为 hibernate 对自然(组合)键没有真正的问题。但有时您可能会发现一些问题(或错误),以及文档问题或试图获得帮助,因为 hibernate 社区广泛承认代理键的好处。因此,请为您选择复合键的原因准备一个很好的答案。

回答by Sinvaldo Pacheco

If Hibernate documentation is properly understood:

如果正确理解 Hibernate 文档:

"There is an alternative <composite-id>declaration that Allows access to legacy data with composite keys. Its use is strongly discouraged for anything else."

“有一个替代<composite-id>声明,允许使用复合键访问遗留数据。强烈建议不要将其用于其他任何用途。”

on topic 5.1.4. id tag xml <id>which enables the primary key mapping made too soon we can conclude that the hibernate documentation discourages the use of <composite-id>instead of <id>xml tag for composite primary key mapping and NOTmake any reference negative to use composite primary keys.

关于主题 5.1.4。id 标签 xml<id>启用主键映射过早我们可以得出结论,休眠文档不鼓励使用xml 标签<composite-id>代替<id>复合主键映射,并且不会对使用复合主键进行任何否定引用。

回答by Nathan Teague

Applications developed with the database as a tool are definitely more beneficial to keep work flow on surrogate keys, using clustered indices for query optimization.

使用数据库作为工具开发的应用程序肯定更有利于保持代理键的工作流,使用聚集索引进行查询优化。

Special care does need to be made for Data Warehousing and OLAP style systems however, that utilize a massive Fact Table to tie surrogate keys of dimensions together. In this case the data dictates the dashboard/application that can be used to maintain records.

然而,确实需要特别注意数据仓库和 OLAP 风格的系统,它们利用大量的事实表将维度的代理键联系在一起。在这种情况下,数据指示可用于维护记录的仪表板/应用程序。

So instead of one method being preferable to another, perhaps it is one directive is advantageous to another, for key construction : You won't be developing a Hibernate app very easily to harness direct access to an SSAS system instance.

因此,与其说一种方法优于另一种方法,不如说一种指令优于另一种指令,对于密钥构造:您不会很容易地开发 Hibernate 应用程序来利用对 SSAS 系统实例的直接访问。

I develop using both key mixtures, and feel to implement a solid star or snowflake pattern a surrogate with clustered index is typically my first choice.

我使用这两种关键混合物进行开发,并且感觉要实现实心星形或雪花图案,具有聚集索引的代理通常是我的首选。

So, to the regards of the OP and others looking by: if you want to stay db invariant with your development (which Hibernate specializes in) -- utilize the surrogate method, and when data reads tend to slow, or you notice certain queries drain performance, revert to your specific database, and add composite, clustered indices that optimize query order.

所以,对于 OP 和其他人的看法:如果你想在你的开发中保持数据库不变(Hibernate 专门研究) - 使用代理方法,当数据读取趋于缓慢,或者你注意到某些查询耗尽时性能,恢复到您的特定数据库,并添加优化查询顺序的复合聚集索引。