哪个更有效:多个 MySQL 表还是一个大表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1125004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:40:09  来源:igfitidea点击:

Which is more efficient: Multiple MySQL tables or one large table?

mysqldatabase-table

提问by Peter Craig

I store various user details in my MySQL database. Originally it was set up in various tables meaning data is linked with UserIds and outputting via sometimes complicated calls to display and manipulate the data as required. Setting up a new system, it almost makes sense to combine all of these tables into one big table of related content.

我将各种用户详细信息存储在我的 MySQL 数据库中。最初它是在各种表格中设置的,这意味着数据与 UserIds 相关联,并通过有时复杂的调用输出以根据需要显示和操作数据。建立一个新系统,将所有这些表组合成一个相关内容的大表几乎是有意义的。

  • Is this going to be a help or hindrance?
  • Speed considerations in calling, updating or searching/manipulating?
  • 这将是一个帮助还是一个障碍?
  • 调用、更新或搜索/操作时的速度考虑?

Here's an example of some of my table structure(s):

这是我的一些表结构的示例:

  • users - UserId, username, email, encrypted password, registration date, ip
  • user_details - cookie data, name, address, contact details, affiliation, demographic data
  • user_activity - contributions, last online, last viewing
  • user_settings - profile display settings
  • user_interests - advertising targetable variables
  • user_levels - access rights
  • user_stats - hits, tallies
  • users - UserId、用户名、电子邮件、加密密码、注册日期、ip
  • user_details - cookie 数据、姓名、地址、联系方式、从属关系、人口统计数据
  • user_activity - 贡献、上次在线、上次查看
  • user_settings - 配置文件显示设置
  • user_interests - 广告可定位变量
  • user_levels - 访问权限
  • user_stats - 命中,计数


Edit:I've upvoted all answers so far, they all have elements that essentially answer my question.

编辑:到目前为止,我已经对所有答案投了赞成票,它们都有本质上回答我的问题的元素。

Most of the tables have a 1:1 relationship which was the main reason for denormalising them.

大多数表具有 1:1 的关系,这是对它们进行非规范化的主要原因。

Are there going to be issues if the table spans across 100+ columns when a large portion of these cells are likely to remain empty?

当这些单元格的大部分可能保持为空时,如果表格跨越 100 多列,是否会出现问题?

采纳答案by user115905

Multiple tables help in the following ways / cases:

多表有助于以下方式/案例:

(a) if different people are going to be developing applications involving different tables, it makes sense to split them.

(a) 如果不同的人要开发涉及不同表的应用程序,那么拆分它们是有意义的。

(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).

(b) 如果你想为不同的人对数据收集的不同部分赋予不同的权限,将它们拆分可能更方便。(当然,您可以查看定义视图并对其进行适当授权)。

(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.

(c) 为了将数据移动到不同的地方,尤其是在开发过程中,使用导致较小文件大小的表可能是有意义的。

(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.

(d) 在您开发针对单个实体的特定数据收集的应用程序时,较小的占用空间可能会给您带来安慰。

(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.

(e) 这是一种可能性:您认为的单值数据将来可能会变成真正的多值数据。例如,截至目前,信用额度是单个值字段。但是明天,您可能决定将这些值更改为(开始日期、结束日期、信用值)。拆分表现在可能会派上用场。

My vote would be for multiple tables - with data appropriately split.

我的投票将用于多个表 - 数据适当拆分。

Good luck.

祝你好运。

回答by Quassnoi

Combining the tables is called denormalizing.

组合表称为非规范化。

It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.

JOIN以创建维护地狱为代价,可能(也可能不会)帮助进行一些查询(这会产生大量s)以更快地运行。

MySQLis capable of using only JOINmethod, namely NESTED LOOPS.

MySQL只能使用JOIN方法,即NESTED LOOPS.

This means that for each record in the driving table, MySQLlocates a matching record in the driven table in a loop.

这意味着对于驱动表中的每条记录,MySQL在循环中从驱动表中定位匹配的记录。

Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.

定位记录是一项成本很高的操作,其耗时可能是纯记录扫描的数十倍。

Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.

将你所有的记录移到一张表中会帮助你摆脱这个操作,但是表本身变大了,表扫描需要更长的时间。

If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.

如果您在其他表中有很多记录,那么增加表扫描可能会超过按顺序扫描记录的好处。

Maintenance hell, on the other hand, is guaranteed.

另一方面,维护地狱是有保证的。

回答by dxh

Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.

他们都是1:1的关系吗?我的意思是,如果一个用户可以属于,比如说,不同的用户级别,或者如果用户兴趣在用户兴趣表中表示为几条记录,那么合并这些表将是不可能的。

Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.

关于之前关于归一化的回答,不得不说,数据库归一化规则完全不顾性能,只看什么是整洁的数据库设计。这通常是您想要实现的目标,但有时为了追求性能而主动非规范化是有意义的。

All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance andmaintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.

总而言之,我认为问题归结为表中有多少个字段,以及访问它们的频率。如果用户活动通常不是很有趣,那么出于性能维护原因,始终将其保留在同一记录中可能会很麻烦。如果某些数据(例如设置)经常访问,但只是包含太多字段,则合并表也可能不方便。如果您只对性能提升感兴趣,您可能会考虑其他方法,例如将设置分开,但将它们保存在自己的会话变量中,这样您就不必经常查询数据库。

回答by Eric Petroelje

Do allof those tables have a 1-to-1relationship? For example, will each user row only have one corresponding row in user_statsor user_levels? If so, it might make sense to combine them into one table. If the relationship is not1 to 1though, it probably wouldn't make sense to combine (denormalize) them.

难道所有这些表都一个1-to-1关系?例如,每个用户行在user_stats或 中user_levels是否只有一个对应的行?如果是这样,将它们组合成一张表可能是有意义的。如果这种关系不是1 to 1,那么组合(非规范化)它们可能没有意义。

Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.

将它们放在单独的表中而不是一个表中可能对性能几乎没有影响,除非您有数十万或数百万条用户记录。您将获得的唯一真正收益是通过组合查询来简化查询。

ETA:

预计到达时间:

If your concernis about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).

如果您担心太多列,那么考虑一下您通常将哪些东西一起使用并将它们组合起来,将其余的放在一个单独的表中(如果需要,或者几个单独的表)。

If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.

如果您查看使用数据的方式,我的猜测是您会发现 80% 的查询使用了 20% 的数据,而其余 80% 的数据只是偶尔使用。将经常使用的 20% 合并到一张表中,将不经常使用的 80% 留在单独的表中,您可能会有一个很好的折衷方案。

回答by Eric Petroelje

Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See thisWiki page for database normalization.

创建一张大表违背了关系数据库的原则。我不会将它们全部合并到一张桌子上。您将获得重复数据的多个实例。例如,如果您的用户有三个兴趣,那么您将有 3 行,相同的用户数据只是为了存储三个不同的兴趣。绝对要采用多个“标准化”表方法。有关数据库规范化,请参阅Wiki 页面。

Edit:I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...

编辑:我已经更新了我的答案,因为你已经更新了你的问题......我现在更同意我的初始答案......

a large portion of these cells are likely to remain empty

这些单元格中的很大一部分可能仍然是空的

If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.

例如,如果用户没有任何兴趣,如果您标准化,那么您的兴趣表中不会有该用户的行。如果您在一个庞大的表中拥有所有内容,那么您将拥有仅包含 NULL 的列(显然很多)。

I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).

我曾在一家电话公司工作过,那里有很多表,获取数据可能需要多次连接。当从这些表中读取的性能很关键时,创建的过程可以生成不需要报告可以指向的连接、计算等的平面表(即非规范化表)。然后将它们与 SQL 服务器代理结合使用,以特定时间间隔运行作业(即某些统计数据的每周视图将每周运行一次,依此类推)。

回答by Rudy Garcia

Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.

为什么不使用与 Wordpress 相同的方法,即拥有一个包含每个人都拥有的基本用户信息的用户表,然后添加一个“user_meta”表,该表基本上可以是与用户 ID 关联的任何键值对。因此,如果您需要查找用户的所有元信息,您只需将其添加到您的查询中即可。如果登录等不需要,您也不必总是添加额外的查询。这种方法的好处还使您的表格可以为您的用户添加新功能,例如存储他们的 twitter 句柄或每个人的兴趣。您也不必处理关联 ID 的迷宫,因为您有一个表格来管理所有元数据,并且您将它限制为只有一个关联而不是 50 个。

Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.

Wordpress 专门这样做是为了允许通过插件添加功能,因此允许您的项目更具可扩展性,并且如果您需要添加新功能,则不需要完整的数据库检修。

回答by Tundey

I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.

我认为这是“视情况而定”的情况之一。拥有多个表更干净,理论上可能更好。但是,当您必须连接 6-7 个表以获取有关单个用户的信息时,您可能会开始重新考虑这种方法。

回答by Richard L

I would say it depends on what the other tables really mean. Does a user_details contain more then 1 more / users and so on. What level on normalization is best suited for your needs depends on your demands.

我会说这取决于其他表的真正含义。user_details 是否包含多于 1 个 / 用户等等。什么级别的标准化最适合您的需求取决于您的需求。

If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.

如果您有一张索引良好的表,那可能会更快。但另一方面可能更难以维护。

To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users. But the rest are probably alot of rows per user?

对我来说,您似乎可以跳过 User_Details,因为它可能与用户是 1 对 1 的关系。但其余的每个用户可能有很多行?