SQL 何时对数据库设计进行非规范化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4301089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 08:25:01  来源:igfitidea点击:

When to Denormalize a Database Design

sqldatabase-designdatabase-normalizationdenormalization

提问by Mark Evans

I know that normalization has been extensively discussed on Stack Overflow. I've read many of the previous discussions. I've got some additional questions though.

我知道在 Stack Overflow 上已经广泛讨论了规范化。我已经阅读了许多以前的讨论。不过,我还有一些其他问题。

I'm working on a legacy system with at least 100 tables. The database is has some un-normalized structure, tables that contain a variety of disparate data, and other problems. I've been given the task of trying to improve it. I can't just start again but need to modify the existing schema.

我正在使用至少有 100 个表的旧系统。数据库具有一些未规范化的结构、包含各种不同数据的表以及其他问题。我被赋予了尝试改进它的任务。我不能只是重新开始,而是需要修改现有的架构。

In the past I have always tried to design normalized databases. Now the questions. A senior developer has suggested that in some cases we can't normalize:

过去我一直试图设计规范化的数据库。现在的问题。一位资深开发人员建议,在某些情况下我们无法正常化:

1) With temporal data. For example an invoice is created that links to a product. If a customer asks for a copy of this invoice a year later we must be able to produce an exact copy of the original. What if the product price, name or description have been updated? The senior guy suggested that the price and other product information should be copied to the invoice table. I'm thinking maybe we should have another table such as productPrice that has a date field so we can track changes in price over time. We would need the same thing for the product description and name I guess? Seems complicated. What do you think?

1) 时态数据。例如,创建链接到产品的发票。如果客户在一年后要求提供此发票的副本,我们必须能够提供原件的准确副本。如果产品价格、名称或描述已更新怎么办?大佬建议把价格等产品信息复制到发票表中。我想也许我们应该有另一个表,例如具有日期字段的 productPrice ,以便我们可以跟踪价格随时间的变化。我猜我们需要相同的产品描述和名称?看起来很复杂。你怎么认为?

2) The database is an accounting system. I'm not very familiar with accounting. At the moment some summary data is derived and stored in the database. For example total sales for the year. My senior associate has said that accountants like to check things are correct by comparing this value with data that is actually calculated from invoices etc to give them confidence that the application is working correctly.

2) 数据库是一个会计系统。我对会计不是很熟悉。目前,一些汇总数据被导出并存储在数据库中。例如当年的总销售额。我的高级助理曾说过,会计师喜欢通过将此值与从发票等实际计算的数据进行比较来检查事情是否正确,以使他们确信应用程序正常工作。

He said that at the moment for example we can tell if someone deleted an invoice from last year mistakenly because the totals will not be the same. He also pointed out that it could be quite slow to calculate these totals on the fly. Of course I said that data should not be duplicated and should always be calculated when needed. I suggested that we could use SQL Reporting Services or some other solution that will generate these reports overnight and cache them. Anyway he's not convinced. Any comments on this?

他说,目前我们可以判断是否有人错误地删除了去年的发票,因为总数不一样。他还指出,即时计算这些总数可能会很慢。当然我说过数据不应该被复制,应该总是在需要的时候计算。我建议我们可以使用 SQL Reporting Services 或其他一些可以在一夜之间生成这些报告并缓存它们的解决方案。反正他不信。对此有何评论?

采纳答案by PerformanceDBA

Your senior colleague is a developer, not a data modeller. You are better off starting from scratch, without them. Normalisation is complicated only to those who will not read books. It is fair enough that he makes you think, but some of the issues are absurd.

您的高级同事是开发人员,而不是数据建模师。你最好从头开始,没有它们。规范化只对那些不会读书的人来说是复杂的。他让你思考是很公平的,但有些问题是荒谬的。

Your numbers:

您的号码:

  1. You need to appreciate the differences between actual online data, and historic data; then the difference between merely historic and archival needs. All of them are right for the specific business requirement, and wrong for all others, there is no universal right and wrong.

    • why is there no paper-based copy of the invoice ? In most countries that would be a legal and tax requirement, what exactly is the difficulty of fishing out the old invoice ?
    • where the database has the requirementof storing the closed invoices, then sure, as soon as the invoice is closed, you need a method of capturing that information.
    • ProductPrice(actually, I would call it ProductDate) is a good idea, but may not be necessary. But you are right, you need to evaluate the currency of data, in the full context of the whole database.
    • I cannot see how copying the product price to the invoice table would help (are not there many line items ?)
    • in modern databases, where the copyof the invoice is required to be regurgitated, the closed Invoice is additionally stored in a different form, eg XML. One customer saves the PDFs as BLOBs. So there is no messing around with what the product price was five years ago. But the basic invoice data is online and current, even for closed invoices; you just cannot recalculate ancient invoice using current prices.
    • some people use an archive_invoice table, but that has problems because now every code segment or user report tool has to look in two places (note that these days some users understand databases better than most developers)
    • Anyway, that is all discussion, for your understanding.
      • The database serves current and archival purposes from the one set of tables (no "archive" tables
      • Once an Invoice is created, it is a legal document, and cannot be changed or deleted (it can be reversed or partially credited by a new Invoice, with negative values). They are marked IsIssued/IsPaid/Etc
      • Productscannot be deleted, they can be marked IsObsolete
      • There are separate tables for InvoiceHeader and InvoiceItem
      • InvoiceItemhas FKs to both InvoiceHeaderand Product
      • for many reasons (not only those you mention), the InvoiceItem row contains the NumUnits; ProductPrice; TaxAmount; ExtendedPrice. Sure, this looks like a "denormalisation" but it is not, because prices, taxation rates, etc, are subject to change. But more important, the legal requirement is that we can reproduce the old invoice on demand.
      • (where it can be reproduced from paper files, this is not required)
      • the InvoiceTotalAmountis a derived column, just SUM()of the InvoiceItems
  2. That is rubbish. Accounting systems, and accountants do not "work" like that.

    • If it is a true accounting system, then it will have JournalEntries, or "double entry"; that is what a qualified account is required to use (by law).

    • Double Entry does not mean duplicate entry; it means every financial transaction (one amount) shall have a source account and target account that it is applied to; so there is no "denormalisation" or duplication. In a banking database, because the financial transactions are against single accounts, that is commonly rendered as two separate financial transactions (rows) within one Db Transaction. Ordinary commercial database constraints are used to ensure that there are two "sides" to every financial transaction.

    • Ensuring that Invoices are not deleteable is a separate issue, to do with security, etc. if anyone is paranoid about things being deleted from their database, andtheir database was not secured by a qualified person, then they have more and different problems that have nothing to do with this question. Obtain a security audit, and do whatever they tell you.

    • Wikipedia is not a reliable source of information about normalisation.

    • A Normalised database is always much faster than Unnormalised database. So it is very important to understand what Normalisation and Denormalisaion is, and what it isn't. The process is greatly hindered when people have fluid and amateur "definitions", it just leads to confusion and time-wasting "discussions". When you have fixed definitions, you can avoid all that, and just get on with the job.

    • Summary tables are quite normal, to save the time and processing power, of recalculating info that does not change, eg: YTD totals for every year but this year; MTD totals for every month in this year but not this month. "Always recalculating" data is a bit silly when (a) the info is very large and (b) does not change. Calculate for the current month only

      • In banking systems (millions of Trades per day), at EndOfDay, we calculate and store Daily Total as well. These are overwritten for the last five days, because Auditors are making changes, and JournalEntries against financial transactions for the last 5 days are allowed.
      • non-banking systems generally do not need daily totals
    • Summary tables are not a "denormalisation" (except in the eyes of those who have just learned about "normalisation" from their magical, ever-changing fluid "source"; or as non-practitioners, who apply simple black-or-white rules to everything). Again, the definition is not being argued here; it simply does not applyto Summary tables.

    • Summary tables do not affect data integrity (assuming of course that the data that they were sourced from was integral).

    • Summary tables are an additionto the database, which are not required to have the same constraints as the database. There are essentially reporting tables or data warehouse tables, as opposed to database tables.

    • There are no Update Anomalies (which is a strict definition) related to Summary tables. You cannot change or delete an invoice from last year. Update Anomalies apply to true Denormalised or Unnormalised current data.

  1. 您需要了解实际在线数据和历史数据之间的差异;那么仅仅是历史需求和档案需求之间的区别。对于特定的业务需求,它们都是对的,对所有其他的都是错的,没有普遍的对与错。

    • 为什么发票没有纸质副本?在大多数国家,这将是法律和税收要求,究竟有什么困难才能找出旧发票?
    • 如果数据库要求存储已关闭的发票,那么当然,一旦发票关闭,您就需要一种捕获该信息的方法。
    • ProductPrice(实际上,我会称之为ProductDate)是个好主意,但可能没有必要。但是您说得对,您需要在整个数据库的完整上下文中评估数据的流通性。
    • 我看不出将产品价格复制到发票表会有什么帮助(不是有很多行项目吗?)
    • 在现代数据库中,发票的副本需要反刍,关闭的发票另外以不同的形式存储,例如 XML。一位客户将 PDF 保存为 BLOB。因此,五年前的产品价格没有任何问题。但是基本的发票数据是在线的和最新的,即使是已关闭的发票;您只是无法使用当前价格重新计算古老的发票。
    • 有些人使用 archive_invoice 表,但这有问题,因为现在每个代码段或用户报告工具都必须查看两个地方(请注意,现在有些用户比大多数开发人员更了解数据库)
    • 无论如何,这都是讨论,供您理解。
      • 数据库从一组表中提供当前和归档目的(没有“归档”表
      • 发票一经创建,即为合法单据,不能更改或删除(可以冲销或部分记入新发票,具有负值)。他们被标记IsIssued/IsPaid/Etc
      • Products不能删除,可以标记 IsObsolete
      • InvoiceHeader 和 InvoiceItem 有单独的表
      • InvoiceItem对两者都有 FKInvoiceHeaderProduct
      • 出于多种原因(不仅是您提到的原因),InvoiceItem 行包含NumUnits; ProductPrice; TaxAmount; ExtendedPrice. 当然,这看起来像是“非规范化”,但实际上并非如此,因为价格、税率等可能会发生变化。但更重要的是,法律要求我们可以按需复制旧发票。
      • (如果可以从纸质文件中复制,则不需要)
      • InvoiceTotalAmount是一个派生列,只是SUM()在InvoiceItems的
  2. 那是垃圾。会计系统和会计师不会那样“工作”。

    • 如果是真正的会计系统,那么就会有JournalEntries,或者说“复式”;这是(法律规定的)合格帐户必须使用的内容。

    • 双重入境并不意味着重复入境;这意味着每笔金融交易(一笔金额)都应该有一个源账户和一个目标账户;所以没有“非规范化”或重复。在银行数据库中,由于金融交易针对单个账户,因此通常在一个 Db 交易中呈现为两个独立的金融交易(行)。普通的商业数据库约束用于确保每笔金融交易都有两个“面”。

    • 确保发票不可删除是一个单独的问题,与安全性等有关。如果有人对从他们的数据库中删除的东西感到偏执,并且他们的数据库没有由合格的人保护,那么他们就会遇到更多不同的问题与这个问题无关。获得安全审计,然后按照他们的指示去做。

    • 维基百科不是关于规范化的可靠信息来源。

    • 规范化数据库总是比非规范化数据库快得多。因此,了解规范化和非规范化是什么以及不是什么非常重要。当人们有流动的和业余的“定义”时,这个过程会受到很大的阻碍,只会导致混乱和浪费时间的“讨论”。当你有固定的定义时,你可以避免所有这些,然后继续工作。

    • 汇总表是很正常的,为了节省时间和处理能力,重新计算不会改变的信息,例如:每年但今年的 YTD 总数;今年每个月的 MTD 总计,但不是本月。当(a)信息非常大并且(b)没有改变时,“总是重新计算”数据有点愚蠢。仅计算当月

      • 在银行系统中(每天数百万笔交易),在 EndOfDay 时,我们也计算和存储每日总计。这些在过去 5 天被覆盖,因为审计员正在进行更改,并且允许针对过去 5 天的金融交易的 JournalEntries。
      • 非银行系统通常不需要每日总计
    • 汇总表不是“非规范化”(在那些刚刚从他们神奇的、不断变化的流动“来源”中了解“规范化”的人的眼中;或者作为非实践者,他们应用简单的非黑即白规则到一切)。同样,这里不讨论定义;它根本不适用于汇总表。

    • 汇总表不影响数据完整性(当然假设它们的来源数据是完整的)。

    • 汇总表是对数据库的补充,不需要与数据库具有相同的约束。与数据库表相反,本质上有报告表或数据仓库表。

    • 没有与汇总表相关的更新异常(这是一个严格的定义)。您不能更改或删除去年的发票。更新异常适用于真正的非规范化或非规范化当前数据。

回答by Jeff Ferland

1) This is an archive. Everything that is in it should never be updated. I'd go with the senior guy's suggestion and have that invoice table be self-contained. Perhaps use a blob for the invoice itself that contains markup language?

1)这是一个档案。其中的所有内容都不应更新。我会接受这位资深人士的建议,并让发票表自成一体。也许对包含标记语言的发票本身使用 blob?

2) Reporting services, a warehouse table that is trigger-updated, something you build by script whenever... these would all be fine, I think. It is indeed ideal to be normalized, but it isn't always fast. I have a good sized healthcare database I manage which is fully normalized... and then has a series of de-normalized tables with rolled-up equations and commonly pulled fields. Almost everything runs from that de-normalized set -- it's just faster to append to these with a trigger when files are loaded than to keep having to pull from various tables everytime I want to look at a 100,000 record report.

2) 报告服务、触发器更新的仓库表、您随时通过脚本构建的东西……我认为这些都很好。归一化确实很理想,但并不总是很快。我有一个大小合适的医疗保健数据库,我管理它是完全规范化的……然后有一系列非规范化的表格,其中包含汇总方程和常用提取字段。几乎所有的东西都从那个非规范化的集合运行——在加载文件时用触发器附加到这些文件比每次我想查看 100,000 条记录报告时都必须从各种表中提取要快。

回答by Unreason

You raise valid points, however you are not completely clear on normalization and what it means, for example in

您提出了有效的观点,但是您对规范化及其含义并不完全清楚,例如在

1) The claim that keeping the invoices as they were denormalizes the data is completely and totally wrong. Let's take price for example - if you have a business requirement that states that you have to keep history of prices then keeping only current price is wrong and it breaks the requirements. And it has nothing to do with normalization, it's simply not designed well. Denormalization is about introducing possibilities for ambiguity into your model (and other artifacts) - and in this case you are simply not modelling your problem space properly.
There is nothing wrong in modelling your database to support temporal data (or versioning and/or separating the areas of the database into archive/temporal and the working set).

1) 声称保持发票原样会使数据非规范化的说法是完全错误的。让我们以价格为例 - 如果您有一个业务要求,说明您必须保留价格历史记录,那么仅保留当前价格是错误的,并且违反了要求。它与规范化无关,只是设计得不好。非规范化是关于在您的模型(和其他工件)中引入歧义的可能性 - 在这种情况下,您只是没有正确地对问题空间进行建模。
为您的数据库建模以支持时态数据(或版本控制和/或将数据库区域分为存档/时态和工作集)并没有错。

Looking at normalization without looking at semantics (in terms of requirements) is not possible.

在不查看语义(就需求而言)的情况下查看规范化是不可能的。

Also, if your senior developer can't see the difference then I guess he didn't get his seniority in RDBMS development ;)

此外,如果您的高级开发人员看不到差异,那么我猜他没有在 RDBMS 开发方面获得资历;)

2) Second part is indeed denormalization. However, if you ever run across senior DB analyst who seriously preaches normalization, you will hear him/her say that it is perfectly acceptable to denormalize as long as you do it consciously and ensure that benefits overweight deficiencies and that anomalies will not bite you. They will also tell you to normalize the logical model and that in the physical model you are allowed to deviate from the ideal for various purposes (performance, maintenance, etc...). In my book the main purpose of normalisation is so that you don't have hidden anomalies (see this article on 5NFfor example)

2)第二部分确实是非规范化。但是,如果您遇到认真鼓吹规范化的高级数据库分析师,您会听到他/她说,只要您有意识地进行非规范化并确保收益超重不足并且异常不会咬您,那么反规范化是完全可以接受的。他们还会告诉您规范化逻辑模型,并且在物理模型中,您可以出于各种目的(性能、维护等)偏离理想状态。在我的书中,规范化的主要目的是让您没有隐藏的异常(例如,参见这篇关于5NF 的文章)

The caching of intermediate results is allowed even on normalized databases and even by biggest evangelists of normalization - you can do it at application layer (as some sort of cache) or you can do it at the database level or you can have a data warehouse for such purposes. These are all valid choices and have nothing to do with normalizing the logical model.

即使在规范化的数据库上,甚至规范化的最大传道者也允许缓存中间结果 - 您可以在应用程序层(作为某种缓存)进行缓存,也可以在数据库级别进行缓存,或者您可以拥有一个数据仓库这样的目的。这些都是有效的选择,与规范化逻辑模型无关。

Also, as for your accountant - you should be able to convince him that what he is claiming is nota good test and develop a set of tests (maybe together with him) that will automate the testing of the system without users intervention and give you higher confidence that your system is bug free.

此外,对于您的会计师 - 您应该能够说服他他声称的不是一个好的测试并开发一组测试(可能与他一起),这些测试将在没有用户干预的情况下自动测试系统并为您提供对您的系统没有错误的信心更高。

On the other hand I know of systems that require users to enter duplicate information, such as to enter the number of lines on the invoice before or after entering actual lines, to insure that the entry is complete. This data is 'duplicated' and you don't have to store it if you have a procedure that will validate the input. If that procedure comes later it is allowed to store the 'denormalized' data - again, the semantics justify it and you can look at the model as normalized. (it is beneficial to wrap your head around this concept)

另一方面,我知道要求用户输入重复信息的系统,例如在输入实际行之前或之后输入发票上的行数,以确保输入完整。此数据是“重复的”,如果您有验证输入的过程,则不必存储它。如果该过程稍后出现,则允许存储“非规范化”数据 - 同样,语义证明它是合理的,您可以将模型视为规范化。(围绕这个概念进行思考是有益的)

EDIT:The term "denormalized" in (2) is not correct if you look at the formal definition of normal forms and if you consider a design denormalized if it breaks any of the normal forms (to some people this is obvious and there is no other way about it).

编辑:(2) 中的术语“非规范化”是不正确的,如果您查看规范形式的正式定义,并且如果您认为设计在破坏任何规范形式时非规范化(对某些人来说这是显而易见的,并且没有其他方式)。

Still, you might want to get used to the idea that a lot of people and not necessary useless texts will use the term normalization for any effort that tries to reduce redundancy in the database (just as an example, you will find scientific papers, by which I don't say that they must be right, just as a warning that it is common, that call derived attributes a form of denormalization, see here).

尽管如此,您可能想要习惯这样的想法,即很多人而不是不必要的无用文本将使用术语规范化来表示任何试图减少数据库中冗余的努力(仅作为示例,您会发现科学论文,通过我并不是说它们一定是对的,只是作为一个警告,它是常见的,将派生属性称为非规范化的一种形式,请参见此处)。

If you want to refer to some more coherent and recognized authorities (again, not recognized by all), maybe the words of C.J.Date can make a clear distinction:

如果要参考一些比较连贯和认可的权威(再说一遍,不是所有人都认可的),也许CJDate的话可以做一个明确的区分:

Much of design theory has to do with reducing redundancy; normalization reduces redundancy within relvars, orthogonality reduces it across relvars.

许多设计理论与减少冗余有关。归一化减少了相关变量内的冗余,正交性减少了相关变量之间的冗余。

qouted from Database in depth: relational theory for practitioners

深度引用数据库:从业者的关系理论

and on the next page

在下一页

just as a failure to normalize all the way implies redundancy and can lead to certain anomalies, so too can a failure to adhere to orthogonality.

正如未能完全规范化意味着冗余并可能导致某些异常一样,未能遵守正交性也可能如此。

So, the proper term for redundancy across relvars is orthogonality (basically all normal forms talk about single relvar so if you look strictly at normalization it would never suggest any improvements due to dependencies between two different relvars).

因此,跨 relvars 冗余的正确术语是正交性(基本上所有范式都谈论单个 relvar,因此如果您严格查看规范化,它永远不会建议由于两个不同 relvar 之间的依赖关系而有任何改进)。

Anyway, one of the other important concepts when you consider database design is also a difference between logical and physical database models. A lot of things that are useful on physical level, such as tables with subtotals or indexes have no place in the logical model - where you try to establish and investigate relationships between the concepts you are trying to model. And that's why you can say they are permissible and they don't ruin the design.

无论如何,当您考虑数据库设计时,其他重要概念之一也是逻辑和物理数据库模型之间的差异。许多在物理层面上有用的东西,例如带有小计或索引的表,在逻辑模型中没有位置 - 您试图在逻辑模型中建立和调查要建模的概念之间的关系。这就是为什么你可以说它们是允许的,它们不会破坏设计。

Lines sometimes can be a bit blurry on what is logical model and what is physical model. Especially good example is a table with subtotals. To consider it part of physical implementation and ignore it on the logical level you have to:

有时在什么是逻辑模型和什么是物理模型上的界限可能有点模糊。特别好的例子是带有小计的表格。要将其视为物理实现的一部分并在逻辑级别上忽略它,您必须:

  • ensure that users (and applications) can not update the subtotal table directly in a manner that is not consistent with their predicate (in another words have a bug in the subtotalling procedure)
  • ensure that users (and applications) can not update the table on which these are dependent without updating the subtotal (in another words that some application will not delete a row from the detail table without updating the total)
  • 确保用户(和应用程序)不能以与其谓词不一致的方式直接更新小计表(换句话说,小计过程中有错误)
  • 确保用户(和应用程序)不能在不更新小计的情况下更新它们所依赖的表(换句话说,某些应用程序不会在不更新总计的情况下从明细表中删除一行)

If you break any of the above rules you will end up with inconsistent databasewhich will provide inconsistent facts. (In such case if you want to formally design a procedure for fixing or examining the problems caused, you would not consider it just an additional table, it would exist at the logical level; where it should not be).

如果您违反上述任何规则,您最终会得到不一致的数据库,这将提供不一致的事实。(在这种情况下,如果您想正式设计一个程序来修复或检查所引起的问题,您不会认为它只是一个附加表,它会存在于逻辑级别;它不应该存在的地方)。

Also, the normalisation always depends on the semantics and the business rules you are trying to model. For example DBAPerformance gives an example in which storing the TaxAmountin the transaction table is not denormalized design, but he fails to mention that it depends what kind of system you are trying to model (is that obvious?); for example if the transaction has another attribute called TaxRateit will usually be denormalized because there is functional dependency on a set of non-key attributes (TaxAmount = Amount * TaxRate => FD: Amount,TaxRate -> TaxAmount), and one of these should be removed or guaranteed to be consistent.

此外,规范化始终取决于您尝试建模的语义和业务规则。例如 DBAPerformance 给出了一个例子,其中存储TaxAmount在事务表中不是非规范化设计,但他没有提到这取决于您尝试建模的系统类型(这很明显吗?);例如,如果交易有另一个称为TaxRate它的属性通常会被非规范化,因为对一组非关键属性(TaxAmount = Amount * TaxRate => FD: Amount,TaxRate -> TaxAmount)存在功能依赖,其中一个应该被删除或保证是一致的。

Obviously, you might say, but, if the system you are building is for an audit company, then you might not have functional dependency - they might be auditing someone who is using manual calculations or has faulty software or must have ability to record incomplete data and the calculation might be wrong originally and as audit company you must record the fact as it happened.

显然,您可能会说,但是,如果您正在构建的系统是为审计公司构建的,那么您可能没有功能依赖——他们可能正在审计使用手动计算或有错误软件或必须能够记录不完整数据的人并且最初的计算可能是错误的,作为审计公司,您必须记录发生的事实。

So, semantics (predicates) which are determined by requirements will influence if any of the normal forms are broken - by influencing functional dependencies (in another words correctly establishing functional dependencies is quite important part of modelling when you strive for normalized database).

因此,由需求确定的语义(谓词)将影响任何规范形式是否被破坏 - 通过影响函数依赖(换句话说,当您争取规范化数据库时,正确建立函数依赖是建模的非常重要的部分)。

回答by nvogel

1) Does not require denormalization. You just need to determine what level of detail of each change you need and persist that with an appropriate key.

1) 不需要非规范化。您只需要确定您需要的每个更改的详细程度,并使用适当的键保持它。

2) Has nothing to do with denormalization. Storing summary data does not make the database denormalized. Storing results derived from non key attributes in the same tablewould be an example of denormalization but that doesn't seem to be what you are talking about here.

2)与非规范化无关。存储摘要数据不会使数据库非规范化。将非关键属性派生的结果存储在同一个表中将是非规范化的一个例子,但这似乎不是你在这里谈论的内容。

回答by Jonathan Rand

Your senior developer makes extremely valid points. I've learned these the hard way myself by servicing systems that don't de-normalize the historical data.

您的高级开发人员提出了非常有效的观点。我自己通过服务不会对历史数据进行非规范化的系统来艰难地学到了这些。

In a sense it's not really adding any overhead to the database. You are creating invoice tables from existing data in the database. An invoice is a snapshot in time. De-normalizing the information you need to produce that invoice can make your reporting SO much easier. When you are required to produce a new report and expected to do it quickly you will appreciate the de-normalization.

从某种意义上说,它并没有真正给数据库增加任何开销。您正在从数据库中的现有数据创建发票表。发票是时间的快照。对生成发票所需的信息进行非规范化可以使您的报告变得更加容易。当您需要生成新报告并希望快速完成时,您会喜欢去规范化。

In terms of having total in the database. This has saved my ass before when I've made a change to an application that caused numbers not to add up the same way (not as hard as you may think). On a live application the totals gave me a definite place to return to in order correct the discrepancies. I've written about this before, you can read it here: http://jlrand.com/?p=95

就数据库中的总数而言。之前当我对一个应用程序进行了更改,导致数字加起来的方式不同(不像你想象的那么难)时,这已经救了我的屁股。在实时应用程序中,总数为我提供了一个明确的返回位置,以纠正差异。我以前写过这个,你可以在这里阅读:http: //jlrand.com/?p=95

回答by user207421

I agree with your senior about (1). A transaction table row must capture the entire state at the moment of the transaction. Period. What you're suggesting doesn't record the actual data, so it's inadmissible. I also agree about (2). Whatever the business wants by way of crosschecking, you must implement. Accounting is based on cross checking, double entry, rolling up ledgers, etc. You must do it. This so fundamental that you shouldn't even look on it as denormalization, just as implementing the business requirement.

我同意你的前辈关于(1)的看法。事务表行必须捕获事务时刻的整个状态。时期。你的建议没有记录实际数据,所以它是不可接受的。我也同意(2)。无论企业通过交叉检查的方式想要什么,您都必须实施。会计基于交叉检查、复式记账、汇总分类账等。您必须这样做。这是非常基本的,您甚至不应将其视为非规范化,就像实现业务需求一样。

回答by RC_Cleland

For #1

对于#1

The invoice should be calculated from the sales and payments. If you do not have detailed sales data including price/product/discount/shipping/etc start there.

发票应根据销售和付款计算。如果您没有详细的销售数据,包括价格/产品/折扣/运费/等,请从那里开始。

For #2

对于#2

Writing an accounting system into the db from scratch is a big project. Make sure you have the accountants give you business rules so you can measure your systems accuracy. The last thing you want is the CFO step into the DBA meeting and announce the DB is overcharging the customer, even worse you are undercharging and driving the company out of business.

从头开始将会计系统写入数据库是一项大工程。确保您让会计师为您提供业务规则,以便您可以衡量系统的准确性。您最不想要的就是 CFO 进入 DBA 会议并宣布 DB 向客户收费过高,更糟糕的是,您收费不足并导致公司破产。

If you have SQL Server give the Adventure Works db a look. If you hate MS then look at Adventure Works and don't do it that way.

如果您有 SQL Server,请查看 Adventure Works 数据库。如果你讨厌 MS,那么看看 Adventure Works,不要那样做。

回答by Lesly Revenge

It seems as if you are considering rather or not you should create a data warehouse. You should never denormalize your database for historical reporting purposes. Creating an archive and storing your information into your data warehouse will do both: denormalize most of the information and maintain your data history.

似乎您正在考虑是否应该创建一个数据仓库。您永远不应该出于历史报告目的对数据库进行非规范化。创建存档并将您的信息存储到数据仓库中将同时实现:对大部分信息进行非规范化处理并维护您的数据历史记录。

回答by Alexandr

Database normalization removes duplicates and makes sql queries for data update more efficient (and gives some other improvements).

数据库规范化删除重复项并使数据更新的 sql 查询更有效(并提供一些其他改进)。

But if most of your queries are used for data selecting and select queries connect to several tables at the time, you may consider denormalization of these tables. It will increase the amount of disk space needed for data, time execution of sql update queries but will improve select queries.

但是如果你的大部分查询都用于数据选择并且选择查询同时连接到几个表,你可能会考虑对这些表进行非规范化。它会增加数据所需的磁盘空间量,时间执行 sql 更新查询,但会改进选择查询。