database 标准化:“重复组”是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23194292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalization: What does "repeating groups" mean?
提问by Solace
I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form. From them I have have gathered that repeating groups are "kind-of" multi-valued attributes (e.g. hereand here).
我已经阅读了不同的教程并看到了不同的规范化示例,特别是第一范式中“重复组”的概念。我从他们那里收集到,重复组是“某种”多值属性(例如here和here)。
But we already make separate tables for each multi-valued attribute by including foreign keys from the parent table during the process of mapping an ERM (Entity relationship Model) to a RDM (Relational Data Model)? Reference: this
但是,在将 ERM(实体关系模型)映射到 RDM(关系数据模型)的过程中,我们已经通过包含来自父表的外键为每个多值属性制作了单独的表?参考:这个
Secondly, are those "repeating groups" essentially laid out horizontally in the same row, or can the same value occurring in the same column again and again, i.e. the same value of an attribute again and again, also a repeating group and should be eliminated?
其次,那些“重复组”本质上是水平排列在同一行中,还是相同的值可以一次又一次地出现在同一列中,即属性的相同值一次又一次,也是重复组,应该消除?
In this example, the value English is repeating again and again. Is this a repeating group?
If I eliminate it to make another table SUBJECT with Subject Name and Module_ID(Foreign key), this is what I get. Sure it gets rid of the repeating value, but I am not sure if this is the right thing. Is it right?

在此示例中,值英语一次又一次地重复。这是重复组吗?如果我消除它以使用主题名称和 Module_ID(外键)制作另一个表 SUBJECT,这就是我得到的。当然它摆脱了重复值,但我不确定这是否正确。这样对吗?

回答by nvogel
The term "repeating group" originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.
术语“重复组”最初是指基于 CODASYL 和 COBOL 的语言中的概念,其中单个字段可以包含一组重复值。当 EFCodd 描述他的第一范式时,这就是他所说的重复组。这个概念在任何现代关系型或基于 SQL 的 DBMS 中都不存在。
The term "repeating group" has also come to be used informallyand imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, ... etc the collection of Child Ncolumns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is nota repeating group in the sense that Codd intended.
术语“重复组”也被数据库设计人员非正式地和不精确地使用,表示一组重复的列,意思是表中包含相似类型值的列的集合。这与其相对于 1NF 的原始含义不同。例如,在名为 Families 的表中,列名为 Parent1、Parent2、Child1、Child2、Child3 等的情况下,Child N列的集合有时被称为重复组,并假设违反 1NF,即使尽管从Codd 的意图来看,它不是一个重复组。
This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a "repeating group" and a violation of the spirit of the First Normal Form.
如果每个属性只是单值,则后一种所谓重复组的含义在技术上并不违反 1NF。属性本身不包含重复值,因此不会因为这个原因违反 1NF。然而,这种设计通常被认为是一种反模式,因为它将表限制为预定的固定数量的值(家庭中最多 N 个孩子),并且因为它强制对每一列重复查询和其他业务逻辑。换句话说,它违反了“ DRY”设计原则。因为它通常被认为是糟糕的设计,所以它适合数据库设计者,有时甚至是教师将这种重复的列称为“重复组”,违反了第一范式的精神。
This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don't permit the possibility of nulls. If a table permits a null in any column then it doesn't meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn't satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them "repeating groups".
术语的这种非正式用法有点令人遗憾,因为它可能有点武断和混乱(一组列实际上何时构成重复?),还因为它分散了对更基本问题的注意力,即 Null 问题。所有范式都关注不允许空值的关系。如果表允许任何列中的空值,则它不满足满足 1NF 的关系模式的要求。在我们的 Families 表的情况下,如果 Child 列允许空值(代表少于 N 个孩子的家庭),则 Families 表不满足 1NF。在规范化练习中经常忘记或忽略空值的可能性,但避免不必要的可空列是避免重复列集的一个很好的理由,
See also this article.
另请参阅本文。
回答by philipxy
the value English is repeating again and again. Is this a repeating group?
英语的价值一再重复。这是重复组吗?
No. The multiple appearances of English in SUBJECT_MODULE are not a repeating group or even either of the two things that people mistakenly mean by a repeating group. They are also not evidence of redundancy or lack of normalization. Such multiple appearances might be connected to redundancy or normalization, but they appear all the time when there is no redundancy and various levels of normalization.
不是。 SUBJECT_MODULE 中多次出现的英语不是重复组,甚至不是人们误认为重复组的两个意思中的任何一个。它们也不是冗余或缺乏规范化的证据。这种多次出现可能与冗余或归一化有关,但它们在没有冗余和各种级别的归一化时一直出现。
If SUBJECT_MODULE is rows where "[SUBJECT_NAME] has [MODULE_NAME] identified by [MODULE_ID]" and a subject might have more than one module then somewhere you musthave multiple mentions of that subject (perhaps via its name) with mentions of different modules (perhaps by name or id). That would not involve redundancy.
如果 SUBJECT_MODULE 是“[SUBJECT_NAME] 具有由 [MODULE_ID] 标识的 [MODULE_NAME]”的行,并且一个主题可能有多个模块,那么在某处您必须多次提及该主题(可能通过其名称)并提及不同的模块(也许按名称或 ID)。那不会涉及冗余。
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
The redundancy in this example from your question's second "this" link is not that Adam appears in two rows or that Adam appears with 15 in two rows. It is that if the table is rows where "[Student] is [Age] years old and takes [Subject]" then Student (eg Adam) can appear in multiple rows but always appears with the same Age(eg 15). But if the table were rows where "[Student] has a friend [Age] years old in [Subject]" then the table could be fully normalized already.
此示例中来自您问题的第二个“ this”链接的冗余不是 Adam 出现在两行中,也不是 Adam 出现在两行中 15。如果表中的行是“[Student] 是 [Age] 岁并且需要 [Subject]”,那么 Student(例如 Adam)可以出现在多行中,但总是以相同的 Age(例如 15)出现。但是,如果该表是“[Student] 在 [Subject] 中有一个 [Age] 岁的朋友”的行,那么该表已经完全规范化了。
Sure it gets rid of the repeating value, but I am not sure if this is the right thing.
当然它摆脱了重复值,但我不确定这是否正确。
It does for your example data, but it might not for other example data. You haven't told us enough. (Anyway as I said above the multiple appearances might not even need normalizing.)
它适用于您的示例数据,但可能不适用于其他示例数据。你告诉我们的还不够多。(无论如何,正如我上面所说,多次出现可能甚至不需要标准化。)
Whether there are any normalization-relevant redundancies in SUBJECT_MODULE or even whether there are any valid decompositions including the one you gave depends on the usual information necessary to normalize to above 1NF. Namely whether some of its columns are functions of others (functional dependencies) and whether its rows are also those where "..." AND "..." (join dependencies).
在 SUBJECT_MODULE 中是否有任何归一化相关的冗余,甚至是否有任何有效的分解,包括你给出的分解,取决于归一化到 1NF 以上所需的常用信息。即它的某些列是否是其他列的函数(函数依赖),以及它的行是否也是“...”和“...”(连接依赖)的那些列。
By giving a possible decomposition you have said that it is also rows where "...[Subject_Name]...[Module_ID]..." AND "...[Module_Name]...[Module_ID]..." And you have given some example decomposition data. But we only know that it couldbe so decomposed because you added the decomposition. And the decomposition plus data still isn't enough for us to know whether it shouldbe so decomposed.
通过给出可能的分解,您已经说过它也是“...[Subject_Name]...[Module_ID]...”和“...[Module_Name]...[Module_ID]...”的行您给出了一些示例分解数据。但是我们只知道它可以被分解,因为你添加了分解。而且分解加数据还不足以让我们知道是否应该这样分解。
I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form.
我已经阅读了不同的教程并看到了不同的规范化示例,特别是第一范式中“重复组”的概念。
"Repeating groups" are something from pre-relational databases and cannot possibly appear in a relational table (relation). They are like a named set of values that is like a field of a record but is not quite. A relational table is always in 1NF. Each column of a row has a single value of the column's type. A non-relational database is "normalized" to tables ie 1NF (first sense of "normalized") which gets rid of repeating groups. Then those tables/relations are "normalized" to higher normal forms (second sense of "normalized").
“重复组”是来自关系前数据库的东西,不可能出现在关系表(关系)中。它们就像一组命名的值,就像记录的一个字段,但又不完全是。关系表总是在 1NF 中。一行的每一列都有一个列类型的值。非关系数据库被“规范化”为表,即 1NF(“规范化”的第一感觉),它摆脱了重复组。然后这些表/关系被“规范化”为更高的规范形式(“规范化”的第二种含义)。
A relational table having multiple similar columns or having a column type with multiple similar parts are each just reminiscentof having a repeating group in a non-relational database. And the multiple columns and parts should become multiple rows in a separate table, just like the multiple members of a repeating group. But these problems have to do with relational quality of design, not repeating groups or normalization (in either sense) or being relational (ie being in 1NF).
具有多个相似列或具有多个相似部分的列类型的关系表,每个都让人联想到在非关系数据库中具有重复组。并且多个列和部分应该成为单独表中的多行,就像重复组的多个成员一样。但是这些问题与设计的关系质量有关,而不是重复组或规范化(在任何一种意义上)或关系(即在 1NF 中)。
Note that a non-relational database might itself have similar problems with multiple similar fields and/or named sets or with multiple similar parts of values of fields. Normalization to tables does not get rid of these when it gets rid of repeating groups.
请注意,非关系数据库本身可能具有多个相似字段和/或命名集或字段值的多个相似部分的相似问题。当它摆脱重复组时,对表的规范化并没有摆脱这些。
Regardless of how they got into a relational design, removing them gives a "better" design. It is just because these design problems are reminiscent of repeating groups that people get confused and imagine that somehow a table could contain a repeating group. So the multiple similar columns and values with multiple similar parts (or the parts) get incorrectly called "repeating groups".
不管它们是如何进入关系设计的,去掉它们会得到一个“更好”的设计。正是因为这些设计问题让人联想到重复组,人们才会感到困惑,并想象以某种方式一张桌子可以包含一个重复组。因此,具有多个相似部分(或部分)的多个相似列和值被错误地称为“重复组”。
See this answer re "atomicity".
看到这个答案是“原子性”。

