database 多少列是太多列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3184478/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How many columns is too many columns?
提问by Stephen Collins
I've noticed that a lot of folks here cite tables with 20+ (I've seen as much as 55) columns in one table. Now I don't pretend to be a database design expert, but I've always heard that this is a horrible practice. When I see this, I usually suggest splitting into two tables with a one to one relationship: one containing the most frequently used data, the other with the least often used data. Though at the same time, there's the possible issue of performance (less JOINs and such). So my question is this:
我注意到这里的很多人都引用了一张表中包含 20 多个(我见过多达 55 个)列的表。现在我不假装自己是数据库设计专家,但我一直听说这是一种可怕的做法。当我看到这一点时,我通常建议拆分为具有一对一关系的两个表:一个包含最常用的数据,另一个包含最不常用的数据。但与此同时,可能存在性能问题(更少的 JOIN 等)。所以我的问题是:
When it comes to really LARGE scale databases, is there actually an advantage to having a large amount of columns, despite the fact that this usually leads to many NULL values?
当涉及到真正的大型数据库时,拥有大量列实际上是否有优势,尽管这通常会导致许多 NULL 值?
Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?
哪个对性能影响更大:有很多 NULL 的很多列,还是有很多 JOIN 的较少列?
回答by Oded
The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.
表的设计取决于它需要存储的实体。如果所有数据都属于一起,那么 50 列(甚至 100 列)可能是正确的做法。
So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.
只要表是规范化的,除了数据库功能和优化的需要之外,没有关于大小的经验法则。
回答by Brian Hooper
I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.
我同意奥德。我见过有 500 列的表格,并且其中的所有列都在正确的位置。只需考虑人们可能希望存储的有关日常物品的事实数量,您很快就会明白原因。
If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.
如果事实证明选择所有这些列不方便,或者当您只对其中的一小部分感兴趣时指定要选择哪些列,您可能会发现定义一个视图是值得的。
回答by graham.reeds
How many columns is too many columns?
多少列是太多列?
When you feel it no longer makes sense or is right to add another column.
当您觉得添加另一列不再有意义或正确时。
Generally depends on application.
一般取决于应用。
回答by John Nicholas
odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.
odbc 的字符限制为 8000 .... 所以这是一个物理限制,超出它就会变得非常令人沮丧。
I worked on a table that had 138 columns .. it was horribly written and could have been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.
我在一张有 138 列的表上工作......它写得很糟糕,本来可以规范化的。虽然这个数据库似乎是有人想知道为什么在数据库设计中有约定并决定一次测试它们的创建。
Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.
当您进入数据仓库和报告服务器时,拥有非常宽的扁平表是相当普遍的。它们的速度要快得多,这意味着您不必为了性能而将数据库全部存储在 ram 中。
回答by Thea
According to my experience it is better to have less joins as those tend to happen too often especially in big database. As long as your database tables are designed to store single entity (student, teacher and so on) this should be ok. So that this will be represented as an object in you code later. So, if you split the entity to several tables you will have to use several joins in order to fill your object later. Also if you use ORM to generate your data access layer (such as Linq in .Net) is will generate separate classes for each table (of course with an relationship between them but still) and this will be harder to use.
根据我的经验,最好减少连接,因为这些连接往往发生得太频繁,尤其是在大型数据库中。只要您的数据库表旨在存储单个实体(学生、教师等),这应该没问题。以便稍后在您的代码中将其表示为对象。因此,如果您将实体拆分为多个表,您将不得不使用多个连接以便稍后填充您的对象。此外,如果您使用 ORM 生成数据访问层(例如 .Net 中的 Linq),则会为每个表生成单独的类(当然它们之间存在关系但仍然存在),这将更难使用。
Another thing is that you can specify which columns to return in your query and this will reduce the data that is passed to your application, but if you need even a single column from another table you will have to do the join. And in most cases as you have so many columns, then the probability to have large amount of data stored in the db is high. So this join would harm more, than the NULLs.
另一件事是您可以指定在查询中返回哪些列,这将减少传递给您的应用程序的数据,但如果您甚至需要另一个表中的单个列,您将不得不进行连接。在大多数情况下,由于您有这么多列,因此在数据库中存储大量数据的可能性很高。所以这个连接会比 NULL 造成更大的伤害。
Every project I have worked on is different so you should find the balance for each story.
我参与过的每个项目都是不同的,所以你应该为每个故事找到平衡。
回答by Albert
It also highly depends on the usecase for your table. If you want to optimize it for reading then it might be a good idea to keep it all together in one table.
它还高度依赖于您的表的用例。如果您想优化它以进行阅读,那么将它们全部放在一张表中可能是个好主意。
In the NO-SQL world (cassandra/hbase for example) there are no constraints on the number of columns and it's actually considered a good practice to have many columns. This also comes from the way it is stored (no gaps). Worth while investigating.
在 NO-SQL 世界(例如 cassandra/hbase)中,对列数没有限制,实际上,拥有许多列被认为是一种很好的做法。这也来自它的存储方式(无间隙)。值得研究。
回答by awgtek
Having too many columns results in a lot nulls (evil) and an unwieldy object the table is mapped to. This hurts readability in the IDE and hinders maintenance (increasing development costs). If you need fast reads in some cases use denormalized tables e.g. used solely for reporting or queries (search for the "CQRS" pattern). Yes "Person" has a million attributes, but you can break down these monothilic tables (design preceeds normalization) to match smaller entities ("address," "phone," "hobby") instead of adding new columns for each new use case. Having smaller sized objects (and tables) brings so many advantages; they enable things like unit testing, OOP, and SOLID practices.
有太多的列会导致很多空值(邪恶)和表映射到的笨重对象。这会损害 IDE 的可读性并阻碍维护(增加开发成本)。如果您在某些情况下需要快速读取,请使用非规范化表,例如仅用于报告或查询(搜索“CQRS”模式)。是的,“人”有一百万个属性,但您可以分解这些单块表(设计先于规范化)以匹配较小的实体(“地址”、“电话”、“爱好”),而不是为每个新用例添加新列。拥有较小尺寸的物体(和桌子)带来了很多好处;它们支持单元测试、OOP 和 SOLID 实践等。
Also, as it regards to bunching numerous columns to avoid joins, I think the performance gain from avoiding joins is lost through index maintenance, assuming a typical workload of both reads and writes. Adding indexes on fields for sake of read performance could be indicative of a need to move those fields into their own table.
此外,关于将大量列聚集在一起以避免连接,我认为避免连接带来的性能增益会通过索引维护而丢失,假设读取和写入的典型工作负载。为了读取性能在字段上添加索引可能表明需要将这些字段移动到它们自己的表中。
回答by eugeneK
Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?
哪个对性能影响更大:有很多 NULL 的很多列,还是有很多 JOIN 的较少列?
It is purely depends on data you store, indexes you make and so on. No one can ensure you that one works better than another without knowing what are you storing. Generally normalization rules will "force" you separate data to different tables and user FKeys if you have large table but i disagree that it ALWAYS performs better than one big table. You can end with 6-7 level joins in dozens of queries that sometimes will cause errors because there much more chances to create an error in larger queries that in simple ones.
它完全取决于您存储的数据、您制作的索引等。没有人可以确保您在不知道您存储什么的情况下,一个比另一个更有效。通常,如果您有大表,规范化规则会“强制”您将数据分离到不同的表和用户 FKeys,但我不同意它总是比一个大表表现得更好。您可以在数十个查询中以 6-7 级联接结束,这有时会导致错误,因为在较大的查询中比在简单的查询中产生错误的机会要多得多。
If you post some requirements of what you are doing maybe we can help you with designing the DB properly.
如果您发布一些您正在做的事情的要求,也许我们可以帮助您正确设计数据库。
回答by CubeSpark
What business need requires more than 60 columns in any data set, let alone a TSQL Table? If there is such a business need, then a Pivot is in order, and the columns should be rows. For example, in the Mining industry, there may be 600 different measurements taken in an assay. The name of each measurement could be a column name. But why create a table with 600 columns and rows of measurements? A geologist would measure the mine each day, perhaps, and fill in the log of 600 columns on one row. That sounds to me like the geologist will lose his mind, and he won't find a sheet of paper long enough. Perhaps a roll would work, but then he would have to unroll the roll and roll it back up again.
什么业务需要任何数据集都需要超过 60 列,更不用说 TSQL 表了?如果有这样的业务需求,那么 Pivot 是有序的,列应该是行。例如,在采矿业中,一次分析可能需要进行 600 次不同的测量。每个度量的名称可以是列名称。但是为什么要创建一个包含 600 列和 600 行测量值的表呢?地质学家可能每天都会测量矿井,并在一行中填写 600 列的日志。在我看来,这听起来像是地质学家会失去理智,而且他找不到足够长的纸张。也许一卷会奏效,但随后他必须展开卷并再次将其卷回。
回答by user3470929
It's better to use a single table by where you can avoid using joins while querying it depends on whether the columns are of same entity or different entity.
最好使用单个表,因为您可以在查询时避免使用连接,这取决于列是属于同一实体还是不同实体。
For example, assume you are doing a database design for work flow where some fields will be edited by junior workers, and some fields by senior workers. In this case it is better to have all the columns in a single table.
例如,假设您正在为工作流程进行数据库设计,其中一些字段将由初级工人编辑,而一些字段将由高级工人编辑。在这种情况下,最好将所有列都放在一个表中。