database 什么是更好的数据库设计:更多的表还是更多的列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/59482/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 06:55:22  来源:igfitidea点击:

What's the better database design: more tables or more columns?

databasedatabase-designnormalizing

提问by raven

A former coworker insisted that a database with more tables with fewer columns each is better than one with fewer tables with more columns each. For example rather than a customer table with name, address, city, state, zip, etc. columns, you would have a name table, an address table, a city table, etc.

一位前同事坚持认为,一个表多但列少的数据库比表少但列多的数据库要好。例如,不是带有姓名、地​​址、城市、州、邮政编码等列的客户表,您将拥有姓名表、地址表、城市表等。

He argued this design was more efficient and flexible. Perhaps it is more flexible, but I am not qualified to comment on its efficiency. Even if it is more efficient, I think those gains may be outweighed by the added complexity.

他认为这种设计更高效、更灵活。也许它更灵活,但我没有资格评论它的效率。即使它更有效,我认为这些收益可能会被增加的复杂性所抵消。

So, are there any significant benefits to more tables with fewer columns over fewer tables with more columns?

那么,与具有更多列的更少表相比,具有更少列的更多表有什么显着的好处吗?

采纳答案by Chris Ammerman

I have a few fairly simple rules of thumb I follow when designing databases, which I think can be used to help make decisions like this....

我在设计数据库时遵循一些相当简单的经验法则,我认为这些规则可用于帮助做出这样的决策......

  1. Favor normalization. Denormalization is a form of optimization, with all the requisite tradeoffs, and as such it should be approached with a YAGNIattitude.
  2. Make sure that client code referencing the database is decoupled enough from the schema that reworking it doesn't necessitate a major redesign of the client(s).
  3. Don't be afraid to denormalize when it provides a clear benefit to performance or query complexity.
  4. Use views or downstream tables to implement denormalization rather than denormalizing the core of the schema, when data volume and usage scenarios allow for it.
  1. 赞成规范化。非规范化是一种优化形式,具有所有必要的权衡,因此应该以YAGNI 的态度进行处理。
  2. 确保引用数据库的客户端代码与架构充分解耦,从而无需对客户端进行重大重新设计。
  3. 当它为性能或查询复杂性提供明显好处时,不要害怕非规范化。
  4. 在数据量和使用场景允许的情况下,使用视图或下游表来实现非规范化而不是非规范化模式的核心。

The usual result of these rules is that the initial design will favor tables over columns, with a focus on eliminating redundancy. As the project progresses and denormalization points are identified, the overall structure will evolve toward a balance that compromises with limited redundancy and column proliferation in exchange for other valuable benefits.

这些规则的通常结果是最初的设计将倾向于表而不是列,重点是消除冗余。随着项目的进展和非规范化点的确定,整体结构将朝着一种平衡发展,即以有限的冗余和列扩散来换取其他有价值的好处。

回答by Bill the Lizard

I would argue in favor of more tables, but only up to a certain point. Using your example, if you separated your user's information into two tables, say USERS and ADDRESS, this gives you the flexibility to have multiple addresses per user. One obvious application of this is a user who has separate billing and shipping addresses.

我会争论支持更多的表格,但仅限于某一点。以您的示例为例,如果您将用户的信息分成两个表,比如 USERS 和 ADDRESS,这使您可以灵活地为每个用户提供多个地址。一个明显的应用是拥有单独的帐单地址和送货地址的用户。

The argument in favor of having a separate CITY table would be that you only have to store each city's name once, then refer to it when you need it. That does reduce duplication, but in this example I think it's overkill. It may be more space efficient, but you'll pay the price in joins when you select data from your database.

支持使用单独的 CITY 表的论点是,您只需存储每个城市的名称一次,然后在需要时引用它。这确实减少了重复,但在这个例子中,我认为这是矫枉过正。它可能更节省空间,但是当您从数据库中选择数据时,您将在连接中付出代价。

回答by swilliams

It doesn't sound so much like a question about tables/columns, but about normalization. In some situations have a high degree of normalization("more tables" in this case) is good, and clean, but it typically takes a high number of JOINs to get relevant results. And with a large enough dataset, this can bog down performance.

这听起来不像是关于表/列的问题,而是关于规范化的问题。在某些情况下,高度规范化(在这种情况下“更多表”)是好的,而且干净,但通常需要大量的 JOIN 才能获得相关结果。如果数据集足够大,这可能会降低性能。

Jeff wrotea little about it regarding the design of StackOverflow. See also the post Jeff links to by Dare Obasanjo.

Jeff 写了一些关于 StackOverflow 设计的内容。另请参阅Dare Obasanjo 的Jeff 链接。

回答by Joel Coehoorn

It depends on your database flavor. MS SQL Server, for example, tends to prefer narrower tables. That's also the more 'normalized' approach. Other engines might prefer it the other way around. Mainframes tend to fall in that category.

这取决于您的数据库风格。例如,MS SQL Server 往往更喜欢更窄的表。这也是更“规范化”的方法。其他引擎可能更喜欢它相反的方式。大型机往往属于这一类。

回答by JosephStyons

A fully normalized design (i.e, "More Tables") is more flexible, easier to maintain, and avoids duplication of data, which means your data integrity is going to be a lot easier to enforce.

完全规范化的设计(即“更多表”)更灵活、更易于维护并避免重复数据,这意味着您的数据完整性将更容易执行。

Those are powerful reasons to normalize. I would choose to normalize first, and then only denormalize specifictables afteryou saw that performance was becoming an issue.

这些都是正常化的有力理由。我会选择先规范化,然后您看到性能成为问题后才对特定进行非规范化。

My experience is that in the real world, you won't reach the point where denormalization is necessary, even with very large data sets.

我的经验是,在现实世界中,即使数据集非常大,您也不会达到必须进行非规范化的程度。

回答by Mark Cidade

Each table should only include columns that pertain to the entity that's uniquely identified by the primary key. If all the columns in the database are all attributes of the same entity, then you'd only need one table with all the columns.

每个表应仅包含与由主键唯一标识的实体相关的列。如果数据库中的所有列都是同一实体的所有属性,那么您只需要一张包含所有列的表。

If any of the columns may be null, though, you would need to put each nullable column into its own table with a foreign key to the main table in order to normalize it. This is a common scenario, so for a cleaner design, you're likley to be adding more tables than columns to existing tables. Also, by adding these optional attributes to their own table, they would no longer need to allow nulls and you avoid a slew of NULL-related issues.

但是,如果任何列可能为空,则需要将每个可为空的列放入其自己的表中,并使用主表的外键对其进行规范化。这是一个常见的场景,因此为了更简洁的设计,您可能需要向现有表添加比列更多的表。此外,通过将这些可选属性添加到它们自己的表中,它们将不再需要允许空值,并且您可以避免大量与 NULL 相关的问题。

回答by Michael Haren

Like everything else: it depends.

像其他一切一样:这取决于。

There is no hard and fast rule regarding column count vs table count.

关于列数与表数没有硬性规定。

If your customers need to have multiple addresses, then a separate table for that makes sense. If you have a really good reason to normalize the City column into its own table, then that can go, too, but I haven't seen that before because it's a free form field (usually).

如果您的客户需要有多个地址,那么单独的表格是有意义的。如果您有充分的理由将 City 列规范化到其自己的表中,那么也可以这样做,但我以前从未见过,因为它是一个自由格式字段(通常)。

A table heavy, normalized design is efficient in terms of space and looks "textbook-good" but can get extremely complex. It looks nice until you have to do 12 joins to get a customer's name and address. These designs are not automaticallyfantastic in terms of performance that matters most: queries.

桌子重,标准化设计在空间方面是有效的,看起来“教科书般的好”,但可能会变得非常复杂。它看起来不错,直到您必须进行 12 次连接才能获得客户的姓名和地址。这些设计在最重要的性能方面并非自动出色:查询。

Avoid complexity if possible. For example, if a customer can have only two addresses (not arbitrarily many), then it might make sense to just keep them all in a single table (CustomerID, Name, ShipToAddress, BillingAddress, ShipToCity, BillingCity, etc.).

如果可能,避免复杂性。例如,如果客户只能有两个地址(不是任意多个),那么将它们全部保存在一个表中可能是有意义的(CustomerID、Name、ShipToAddress、BillingAddress、ShipToCity、BillingCity 等)。

Here's Jeff's poston the topic.

这是杰夫关于该主题的帖子

回答by Chris Upchurch

The multi-table database is a lot more flexible if any of these one to one relationships may become one to many or many to many in the future. For example, if you need to store multiple addresses for some customers, it's a lot easier if you have a customer table and an address table. I can't really see a situation where you might need to duplicate some parts of an address but not others, so separate address, city, state, and zip tables may be a bit over the top.

如果这些一对一关系中的任何一个将来可能变成一对多或多对多,那么多表数据库就会更加灵活。例如,如果您需要为某些客户存储多个地址,那么如果您有一个客户表和一个地址表,那就容易多了。我真的看不到您可能需要复制地址的某些部分而不是其他部分的情况,因此单独的地址、城市、州和邮政编码表可能有点过分。

回答by Dillie-O

There are advantages to having tables with fewer columns, but you also need to look at your scenario above and answer these questions:

使用较少列的表有一些好处,但您还需要查看上面的场景并回答以下问题:

Will the customer be allowed to have more than 1 address? If not, then a separate table for address is not necessary. If so, then a separate table becomes helpful because you can easily add more addresses as needed down the road, where it becomes more difficult to add more columns to the table.

是否允许客户拥有 1 个以上的地址?如果不是,则不需要单独的地址表。如果是这样,那么单独的表会很有帮助,因为您可以根据需要轻松添加更多地址,而在该表中添加更多列变得更加困难。

回答by kemiller2002

There are many sides to this, but from an application efficiency perspective mote tables can be more efficient at times. If you have a few tables with a bunch of columns every time the db as to do an operation it has a chance of making a lock, more data is made unavailable for the duration of the lock. If locks get escalated to page and tables (well hopefully not tables :) ) you can see how this can slow down the system.

这有很多方面,但从应用程序效率的角度来看,节点表有时会更高效。如果每次数据库执行操作时都有一些带有一堆列的表,它有机会锁定,则在锁定期间更多数据不可用。如果锁升级到页和表(希望不是表 :) ),您可以看到这会如何减慢系统速度。