MySQL 什么更好 - 多张小桌子还是一张大桌子?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4089830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:37:40  来源:igfitidea点击:

What's better - many small tables or one big table?

mysqldatabasedatabase-designdatabase-schema

提问by Mazatec

I've got a database which will store profiles about individuals. These individuals have about 50 possible fields.

我有一个数据库,可以存储有关个人的个人资料。这些人有大约 50 个可能的领域。

Some are common things like, first name, last name, email, phone number.

有些是常见的东西,如名字、姓氏、电子邮件、电话号码。

Others are things like hobbies, skills, interests

其他是爱好、技能、兴趣等

Some are height, weight, skin color.

有些是身高、体重、肤色。

Each of these groups are used by the system at different times. In terms of being able to negotiate through the database I would prefer to have 7 tables each of about 8 fields. What is best practice to do?

系统在不同时间使用这些组中的每一个。就能够通过数据库进行协商而言,我更喜欢有 7 个表,每个表大约有 8 个字段。什么是最佳实践?

EDIT:The data is going to be used in a search engine, for finding profile matches. Does this affect what I am doing?

编辑:数据将用于搜索引擎,用于查找配置文件匹配项。这会影响我正在做的事情吗?

采纳答案by Jim W.

It is hard to say, and is based on what the application requires. I would say to look into Database Normalizationas it will show you how to normalize the database and in that it should shed light on what you would want to separate out into their own tables etc.

这很难说,并基于应用程序的需求。我会说研究数据库规范化,因为它将向您展示如何规范化数据库,并且它应该阐明您想要将哪些内容分离到他们自己的表中等。

回答by NealB

I'm with the Normalize camp.

我在正常化营地。

Here are a few hints to get you started:

以下是一些帮助您入门的提示:

Start with a process to assign some arbitrary unique identifier to each "person". Call this the PersonIdor something like that. This identifier is called a surrogate key. The sole purpose of a surrogate key is to guarantees a 1 to 1 relationship between it and a real person in the real world. Use the surrogate key when associating the value of some other attribute to a "person" in your database.

从为每个“人”分配一些任意唯一标识符的过程开始。称其为PersonId或类似的东西。此标识符称为代理键。代理键的唯一目的是保证它与现实世界中的真实人之间是 1 对 1 的关系。将某些其他属性的值与数据库中的“人”相关联时,请使用代理键。

As you develop your database layout you may find surrogate keys necessary (or at least useful) for some other attributes as well.

在开发数据库布局时,您可能会发现其他一些属性也需要(或至少有用)代理键。

Look at each attribute you want to manage. Ask the following question: Does any given person have only one value for this attribute?

查看您要管理的每个属性。提出以下问题:是否任何给定的人都只有这个属性的一个值?

For example, each person has exactly one "Birth Date". But how may "Hobbies" can they have? Probably zero to many. Single valued attributes (eg. Birth date, height, weight etc.) are candidates to go into a common table with PersonIdas the key. The number of attributes in each table should not be of concern at this point.

例如,每个人只有一个“出生日期”。但是他们怎么可能有“爱好”呢?可能从零到很多。单值属性(例如出生日期、身高、体重等)是PersonId作为关键字进入公共表的候选对象。此时不应关注每个表中的属性数量。

Multi valued attributes such as Hobby need a slightly different treatment. You might want to create separate tables for each multi-valued attribute. Using Hobbies as an example you might create the following table PersonHobby(PersonId, Hobby). A row in this table might look something like: (123, "Stamp Collecting"). This way you can record as many hobbies as required for each person, one per row. Do the same for "Interest", "Skill" etc.

像爱好这样的多值属性需要稍微不同的处理。您可能希望为每个多值属性创建单独的表。以爱好为例,您可以创建下表PersonHobby(PersonId, Hobby)。此表中的一行可能类似于:(123, "Stamp Collecting")。通过这种方式,您可以根据每个人的需要记录尽可能多的爱好,每行一个。对“兴趣”、“技能”等做同样的事情。

If there are quite a number of multi-valued attributes where the combination of PersonId + Hobbydetermine nothing else (ie. you don't have anything interesting to record about this person doing this "Hobby" or "Interest" or "Skill") you could lump them into an Attribute-Value table having a structure something like PersonAV(PersonId, AttributeName, Value). Here a row might look like: (123, "Hobby", "Stamp Collecting").

如果有相当多的多值属性,其中的组合PersonId + Hobby决定不了其他任何事情(即,您没有任何关于此人从事此“爱好”或“兴趣”或“技能”的有趣记录),您可以混为一谈将它们放入具有类似PersonAV(PersonId, AttributeName, Value). 这里一个行可能看起来像:(123, "Hobby", "Stamp Collecting")

If you go this route, it is also a good idea to substitute the AttributeNamein the PersonAVtable for a surrogate key and create another table to relate this key to its description. Something like: Attribute(AttributeId, AttributeName). A row in this table would look something like (1, "Hobby")and a corresponding PersonAVrow could be (123, 1, "Stamp Collecting"). This is commonly done so that if you ever need to know which AttributeNamesare valid in your database/application you have a place to look them up. Think about how you might validate whether "Interest" is a valid value for AttributeNameor not - if you haven't recorded some person having that AttributeNamethen there is no record of that AttributeNameon your database - how do you know if it should exist or not? Well look it up in the Attributetable!

如果您走这条路线,最好AttributeNamePersonAV表中的替换为代理键并创建另一个表以将此键与其描述相关联。类似的东西:Attribute(AttributeId, AttributeName)。该表中的一行看起来像这样 (1, "Hobby"),对应的PersonAV行可能是(123, 1, "Stamp Collecting"). 通常这样做是为了如果您需要知道AttributeNames在您的数据库/应用程序中哪些是有效的,您有一个地方可以查找它们。想一想如何验证“兴趣”是否是一个有效值 AttributeName——如果你没有记录某个人拥有它,AttributeName那么AttributeName你的数据库中就没有记录——你怎么知道它是否应该存在?那么在Attribute表中查找它!

Some attributes may have multiple relationships and that too will influence how tables are normalized. I didn't see any of these dependencies in your example so consider the following: Suppose we have a warehouse full of parts, the PartIddetermines its WeightClass, StockCountand ShipCost. This suggests a table something like: Part(PartId, WeightClass, StockCount, ShipCost). However if relationship exists between non-key attributes then they should be factored out. For example suppose WeightClassdirectly determines ShipCost. This implies that WeightClassalone is enough to determine ShipCostand ShipCostshould be factored out of the Parttable.

某些属性可能有多种关系,这也会影响表的规范化方式。我在您的示例中没有看到任何这些依赖项,因此请考虑以下事项:假设我们有一个装满零件的仓库,PartId确定其WeightClass,StockCountShipCost。这表明一个表是这样的:Part(PartId, WeightClass, StockCount, ShipCost)。但是,如果非关键属性之间存在关系,则应将其排除。例如假设WeightClass直接确定ShipCost。这意味着WeightClass仅凭这一点就足以确定ShipCost并且ShipCost应该从Part表中排除。

Normalization is a fairly subtle art. You need to identify the functional dependencies that exist between all of the attributes in your data model in order to do it properly. Just coming up with the functional dependencies takes a fair bit of thought and consideration - but it is critical to getting to the proper database design.

归一化是一门相当微妙的艺术。您需要识别数据模型中所有属性之间存在的功能依赖关系,以便正确执行此操作。仅仅提出函数依赖需要相当多的思考和考虑——但对于获得正确的数据库设计来说至关重要。

I encourage you to take the time to study normalization a bit more before building your database. A few days spent here will more than pay for itself down the road. Try doing some Google/Wikipedia searches for "Functional Dependency", "Normalization" and "Database Design". Read, study, learn, then build it right.

我鼓励您在构建数据库之前花时间多研究规范化。在这里度过的几天将足以让您在旅途中物有所值。尝试在 Google/Wikipedia 上搜索“功能依赖”、“规范化”和“数据库设计”。阅读、学习、学习,然后正确构建。

The suggestions I have made with respect to normalizing your database design are only a hint as to the direction you might need to take. Without having a strong grasp of all the data you are trying to manage in your application, any advice given here should be taken with a "grain of salt".

我就规范化您的数据库设计提出的建议只是对您可能需要采取的方向的一个提示。如果没有对您尝试在应用程序中管理的所有数据有很好的掌握,这里给出的任何建议都应该“谨慎”。

回答by RKh

I would recommend few tables. Over normalization is difficult to manage and you would end up writing complex queries which ends up with slow performance.

我会推荐几张桌子。过度规范化很难管理,您最终会编写复杂的查询,从而导致性能下降。

Normalize only when absolutely needed and think in logical terms. With the limited information you provided above, I would go for three tables:

仅在绝对需要时才进行规范化,并以逻辑方式思考。由于您上面提供的信息有限,我会选择三张桌子:

Table 1:PersonalDetails Table 2:Activities Table 3:Miscellaneous

表 1:个人详细信息 表 2:活动 表 3:杂项

There are other techniques to speed up the performance like clustering etc., which you can use depending upon your need.

还有其他技术可以加快性能,例如聚类等,您可以根据需要使用它们。

回答by Raj More

IMO, it is more important to worry about the quality of data stored than the number of tables that you need.

IMO,担心存储的数据质量比您需要的表数量更重要。

For example, do you need to track changes? If John was 5'2" in January 2007 and is 5'11" in Oct 2010, do you want to know? If so, you will need to separate out the person from the height into two tables.

例如,您是否需要跟踪更改?如果约翰在 2007 年 1 月是 5'2",而在 2010 年 10 月是 5'11",你想知道吗?如果是这样,您将需要将人从高度分离到两个表中。

How about hobbies - are they allowed to only have 3 hobbies? Can they have more / less? Is this something you would want to query in the future? If so, you need a separate table.

爱好怎么样——他们只能有 3 个爱好吗?他们可以有更多/更少吗?这是您将来想要查询的内容吗?如果是这样,您需要一个单独的表。

You should read up on database design and normalization (there are several excellent threads on this site itself).

您应该阅读数据库设计和规范化(该站点本身有几个很好的主题)。

https://stackoverflow.com/questions/tagged/normalization

https://stackoverflow.com/questions/tagged/normalization

回答by Kevin O'Donovan

From what you've described I'd certainly break that into multiple tables. I wouldn't split on an arbitrary number of columns though, instead try to think of logical collections of columns that either make up an entity or match the access patterns you're going to be using to hit the data

根据您的描述,我肯定会将其分成多个表。不过,我不会拆分任意数量的列,而是尝试考虑组成实体或匹配您将用于访问数据的访问模式的列的逻辑集合

回答by David Oneill

Unless every person has the same number of hobbies (IE everyone has 2 hobbies listed), it should be normalized.

除非每个人都有相同数量的爱好(IE每个人都列出了2个爱好),否则应该归一化。

Fields that are always 1 to 1 with the person should be in the same table. Age for example. No person will have two different ages.

与此人始终为 1 比 1 的字段应在同一张表中。以年龄为例。没有人会有两个不同的年龄。

回答by Novikov

There is not database organization that's 100% correct, there's only one that's good enough for your purposes. If you don't foresee surpassing the capabilities of a single good database server in the future, then normalize the data and use plenty of constraints such as foreign keys, cascading deletes and such as that will make your database a joy to work with. On the other hand if you look at the databases of a lot of applications that have billions of requests you'll find that they forgo a lot of these niceties in the name of performance and scalability.

没有 100% 正确的数据库组织,只有一种足以满足您的目的。如果您没有预见到将来会超越单个优秀数据库服务器的功能,那么请对数据进行规范化并使用大量约束,例如外键、级联删除等,这将使您的数据库使用起来很愉快。另一方面,如果您查看具有数十亿请求的许多应用程序的数据库,您会发现它们以性能和可伸缩性的名义放弃了许多这些细节。

回答by Matthew J Morrison

There is no correct answer to this question because it largely depends on when and how you are going to be using your data, how frequently it will change, and what the volume of usage will be on the database.

这个问题没有正确答案,因为它在很大程度上取决于您将何时以及如何使用数据、数据更改的频率以及数据库上的使用量。

What I would personally do would be to organize your data into logical entities and create tables based on those entities. This is at least where I would start.

我个人会做的是将您的数据组织成逻辑实体并基于这些实体创建表。这至少是我要开始的地方。

回答by Adeel

many small tables i.e. Normalization is best here. it provides flexiblility, reduces redundancy and a better database organization.

许多小表,即标准化在这里是最好的。它提供了灵活性,减少了冗余和更好的数据库组织。