MySQL:多张表还是一张多列的表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9774715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL: multiple tables or one table with many columns?
提问by Xavier_Ex
So this is more of a design question.
所以这更像是一个设计问题。
I have one primary key (say the user's ID), and I have tons of information associated with that user.
我有一个主键(比如用户的 ID),而且我有大量与该用户相关的信息。
Should I have multiple tables broken down into categories according to the information, or should I have just one table with many columns?
我应该根据信息将多个表分解为类别,还是应该只有一个包含多列的表?
The way I used to do it was to have multiple tables, so say, one table for application usage data, one table for profile info, one table for back end tokens etc. to keep things looking organized.
我过去的做法是拥有多个表,例如,一张用于应用程序使用数据的表,一张用于配置文件信息的表,一张用于后端令牌的表等,以使事情看起来井井有条。
Recently some one told me that it's better not to do it that way and having a table with lots of columns is fine. The thing is, all those columns have the same primary key.
最近有人告诉我,最好不要那样做,有一个有很多列的表很好。问题是,所有这些列都具有相同的主键。
I'm pretty new to database design so which approach is better and what are the pros and cons?
我对数据库设计很陌生,所以哪种方法更好,优缺点是什么?
What's the conventional way of doing it?
传统的做法是什么?
采纳答案by Brendan Long
Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.
任何时候信息都是一对一的(每个用户都有一个名称和密码),那么最好将其放在一张表中,因为它减少了数据库为检索结果而需要执行的连接次数。我认为有些数据库对每个表的列数有限制,但在正常情况下我不会担心,如果需要,您可以随时拆分它。
If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).
如果数据是一对多的(每个用户有几千行的使用信息),那么就应该拆分成单独的表来减少重复数据(重复数据浪费存储空间、缓存空间,使数据库更难维护)。
You might find the Wikipedia article on database normalizationinteresting, since it discusses the reasons for this in depth:
您可能会发现有关数据库规范化的 Wikipedia 文章很有趣,因为它深入讨论了原因:
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
数据库规范化是组织关系数据库的字段和表以最小化冗余和依赖性的过程。规范化通常涉及将大表划分为较小(且冗余较少)的表并定义它们之间的关系。目标是隔离数据,以便可以在一个表中添加、删除和修改字段,然后通过定义的关系传播到数据库的其余部分。
Denormalizationis also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.
非规范化也是需要注意的,因为在某些情况下重复数据更好(因为它减少了数据库在读取数据时需要做的工作量)。我强烈建议您在开始时尽可能规范化您的数据,并且只有在您意识到特定查询中的性能问题时才进行非规范化。
回答by HLGEM
One big table is often a poor choice. Related tables are what relational database were designed to work with. If you index properly and know how to write performant queries, they are going to perform fine.
一张大桌子通常是一个糟糕的选择。相关表是关系数据库设计用来处理的。如果您正确编制索引并知道如何编写高性能查询,它们就会表现良好。
When tables get too many columns, then you can run into issues with the actual size of the page that the database is storing the information on. Either the record can end up being too large for the page, in which can you may end up not being able to create or update a specific record which makes users unhappy or you may (in SQL Server at least) be allowed some overflow for particular datatypes (with a set of rules you need to look up if you are doing this) but if many records will overflow the page size you can create tremedous performance problems. Now how MYSQL handles the pages and whether you have a problem when the potential page size gets too large is something you would have to look up in the documentation for that database.
当表的列过多时,您可能会遇到数据库存储信息的页面实际大小的问题。要么记录可能最终对于页面来说太大,在这种情况下您可能最终无法创建或更新使用户不满意的特定记录,或者您可能(至少在 SQL Server 中)被允许某些特定的溢出数据类型(如果您这样做,您需要查找一组规则)但是如果许多记录会溢出页面大小,您可能会产生巨大的性能问题。现在 MYSQL 如何处理页面以及当潜在的页面大小变得太大时您是否有问题是您必须在该数据库的文档中查找的内容。
回答by Vlad
I have a good example. Overly Normalized database with the following set of relationships:
我有一个很好的例子。具有以下一组关系的过度规范化数据库:
people -> rel_p2staff -> staff
and
和
people -> rel_p2prosp -> prospects
Where people has names and persons details, staff has just the staff record details, prospects has just prospects details, and the rel tables are relationship tables with foreign keys from people linking to staff and prospects.
人们有姓名和人员详细信息,员工只有员工记录详细信息,潜在客户只有潜在客户详细信息,而 rel 表是带有外键的关系表,这些外键来自链接到员工和潜在客户的人员。
This sort of design carries on for entire database.
这种设计是针对整个数据库进行的。
Now to query this set of relations it's a multi-table join every time, sometimes 8 and more table join. It has been working fine up to mid this year, when it started getting very slow now that we past 40000 records of people.
现在要查询这组关系,每次都是多表联接,有时是 8 个或更多表联接。直到今年年中,它一直运行良好,当我们超过 40000 人的记录后,它开始变得非常缓慢。
Indexing and all low hanging fruits had been used up last year, all queries are optimized to perfection. This is the end of the road for the particular normalized design and management now approved a rebuilt of entire application that depends on it as well as restructure of the database, over a term of 6 months. $$$$ Ouch.
索引和所有悬而未决的成果已于去年用完,所有查询均已优化至完美。这是特定规范化设计和管理的终点,现在批准在 6 个月的期限内重建依赖于它的整个应用程序以及数据库的重组。$$$$ 哎哟。
The solution will be to have a direct relation for people -> staff
and people -> prospect
解决方案是与people -> staff
和people -> prospect
回答by moinhaque
Came across this, and as someone who used to use MySQL a lot, and then switched over to Postgres recently, one of the big advantages is that you can add JSON objects to a field in Postgres.
遇到了这个,作为一个以前经常使用 MySQL 的人,最近又切换到 Postgres,一个很大的优点是你可以将 JSON 对象添加到 Postgres 的字段中。
So if you are in this situation, you don't have to necessarily decide between one large table with many columns and splitting it up, but you can merge columns into JSON objects to reduce it e.g. instead of address being 5 columns, it can just be one. You can also query on that object too.
因此,如果您处于这种情况,您不必在一个包含多列的大表并将其拆分之间做出决定,但是您可以将列合并到 JSON 对象中以减少它,例如代替地址为 5 列,它可以成为一个。您也可以查询该对象。
回答by Brian
ask yourself these questions if you put everything in one table, will you have multiple rows for that user? If you have to update a user do you want to keep an audit trail? Can the user have more than one instance of a data element? (like phone number for instance) will you have a case where you might want to add an element or set of elements later? if you answer yes then most likely you want to have child tables with foreign key relationships.
问问自己这些问题,如果你把所有东西都放在一张表中,你会为那个用户有多行吗?如果您必须更新用户,是否要保留审计跟踪?用户可以拥有多个数据元素的实例吗?(例如电话号码)您是否会遇到稍后可能想要添加一个元素或一组元素的情况?如果您回答是,那么您很可能希望拥有具有外键关系的子表。
Pros of parent/child tables is data integrity, performance via indexes (yes you can do it on a flat table also) and IMO easier to maintain if you need to add a field later, especially if it will be a required field.
父/子表的优点是数据完整性、通过索引的性能(是的,您也可以在平面表上这样做)并且如果您稍后需要添加字段,尤其是当它是必填字段时,IMO 更易于维护。
Cons design is harder, queries become slightly more complex
缺点设计更难,查询变得稍微复杂
But, there are many cases where one big flat table will be appropriate so you have to look at your situation to decide.
但是,在很多情况下,一张大的平桌是合适的,因此您必须根据自己的情况来决定。
回答by christopher
I'm already done doing some sort of database design. for me, it depends on the difficulty of the system with database management; yeah it is true to have unique data in one place only but it is really hard to make queries with overly normalized database with lots of record. Just combine the two schema; use one huge table if you feel that you'll be having a massive records that are hard to maintain just like facebook,gmail,etc. and use different table for one set of record for simple system... well this is just my opinion .. i hope it could help.. just do it..you can do it... :)
我已经完成了某种数据库设计。对我来说,这取决于系统对数据库管理的难度;是的,在一个地方只有唯一的数据是正确的,但是使用具有大量记录的过度规范化的数据库进行查询真的很困难。只需结合两个模式;如果您觉得自己将拥有像 facebook、gmail 等一样难以维护的大量记录,请使用一张大桌子。并为简单系统的一组记录使用不同的表......好吧,这只是我的意见......我希望它可以帮助......就去做......你可以做到...... :)
回答by Craig Trombly
The conventional way of doing this would be to use different tables as in a star schema or snowflake schema. Howeevr, I would base this strategy to be two fold. I believe in the theory that data should only exist in one place, there for the schema I mentioned would work well. However, I also believe that for reporting engines and BI suites, a columnar approach would be hugely beneficial becuase it is more supportive of the the reporting needs. Columnar approaches like those with infobright.org have huge performance gains and compression that makes using both approaches incredibly useful. Alot of companies are starting to realize that have just one database architecture in the organization is not supportive of the full range of their needs. Alot of companies are implementing both the concept of having more than one database achitecture.
这样做的传统方法是使用不同的表,如星型模式或雪花模式。然而,我认为这个策略是双重的。我相信数据应该只存在于一个地方的理论,我提到的模式在那里工作得很好。但是,我也相信,对于报告引擎和 BI 套件,柱状方法将非常有益,因为它更能支持报告需求。像 infobright.org 那样的列式方法具有巨大的性能提升和压缩,这使得使用这两种方法都非常有用。许多公司开始意识到组织中只有一种数据库架构并不能满足他们的全部需求。许多公司都在实施拥有多个数据库架构的概念。
回答by user8081853
i think having a single table is more effective but you should make sure that the table is organised in a manner that it shows the relationship,trend as well as the difference in variables of the same row. for example if the table shows age and grades of the students you should arange the table in a manner that thank highest scorer is well differentiated with the lowest scorer and the difference in the age of students is even.
我认为使用单个表格更有效,但您应该确保表格的组织方式能够显示关系、趋势以及同一行变量的差异。例如,如果表格显示学生的年龄和成绩,您应该以感谢最高分者与最低分者的差别很大并且学生年龄差异均匀的方式排列表格。