SQL 数据库记录的物理与逻辑/软删除?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/378331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Physical vs. logical / soft delete of database record?
提问by user21826
What is the advantage of doing a logical/soft delete of a record (i.e. setting a flag stating that the record is deleted) as opposed to actually or physically deleting the record?
与实际或物理删除记录相比,对记录进行逻辑/软删除(即设置表明记录已被删除的标志)有什么优势?
Is this common practice?
这是常见的做法吗?
Is this secure?
这安全吗?
回答by Chris Shaffer
Advantages are that you keep the history (good for auditing) and you don't have to worry about cascading a delete through various other tables in the database that reference the row you are deleting. Disadvantage is that you have to code any reporting/display methods to take the flag into account.
优点是您可以保留历史记录(有利于审计)并且您不必担心通过引用您要删除的行的数据库中的各种其他表级联删除。缺点是您必须编写任何报告/显示方法以将标志考虑在内。
As far as if it is a common practice - I would say yes, but as with anything whether you use it depends on your business needs.
就如果这是一种常见做法而言 - 我会说是的,但是对于任何事情,您是否使用它取决于您的业务需求。
EDIT: Thought of another disadvantange - If you have unique indexes on the table, deleted records will still take up the "one" record, so you have to code around that possibility too (for example, a User table that has a unique index on username; A deleted record would still block the deleted users username for new records. Working around this you could tack on a GUID to the deleted username column, but it's a very hacky workaround that I wouldn't recommend. Probably in that circumstance it would be better to just have a rule that once a username is used, it can never be replaced.)
编辑:想到另一个缺点 - 如果表上有唯一索引,删除的记录仍将占用“一个”记录,因此您也必须围绕这种可能性进行编码(例如,具有唯一索引的用户表)用户名;已删除的记录仍会阻止已删除用户的用户名以获取新记录。解决此问题,您可以将 GUID 附加到已删除的用户名列,但这是一种非常hacky 的解决方法,我不推荐。可能在这种情况下它会最好有一个规则,一旦用户名被使用,它就永远不能被替换。)
回答by JoshBerke
Are logical deletes common practice? Yes I have seen this in many places. Are they secure? That really depends are they any less secure then the data was before you deleted it?
逻辑删除是常见的做法吗?是的,我在很多地方都看到了这一点。他们安全吗?这真的取决于它们是否比删除数据之前的数据安全性低?
When I was a Tech Lead, I demanded that our team keep every piece of data, I knew at the time that we would be using all that data to build various BI applications, although at the time we didn't know what the requirements would be. While this was good from the standpoint of auditing, troubleshooting, and reporting (This was an e-commerce / tools site for B2B transactions, and if someone used a tool, we wanted to record it even if their account was later turned off), it did have several downsides.
当我担任 Tech Lead 时,我要求我们的团队保留每条数据,当时我知道我们将使用所有这些数据来构建各种 BI 应用程序,尽管当时我们不知道需要什么是。虽然从审计、故障排除和报告的角度来看这是很好的(这是一个用于 B2B 交易的电子商务/工具站点,如果有人使用了工具,我们希望即使他们的帐户后来被关闭,我们也想记录它),它确实有几个缺点。
The downsides include (not including others already mentioned):
缺点包括(不包括其他已经提到的):
- Performance Implications of keeping all that data, We to develop various archiving strategies. For example one area of the application was getting close to generating around 1Gb of data a week.
- Cost of keeping the data does grow over time, while disk space is cheap, the ammount of infrastructure to keep and manage terrabytes of data both online and off line is a lot. It takes a lot of disk for redundancy, and people's time to ensure backups are moving swiftly etc.
- 保留所有数据的性能影响,我们制定各种归档策略。例如,应用程序的一个领域接近于每周生成大约 1Gb 的数据。
- 保存数据的成本确实会随着时间的推移而增加,虽然磁盘空间很便宜,但在线和离线保存和管理 TB 级数据的基础设施数量很多。冗余需要大量磁盘,并且人们需要时间来确保备份快速移动等。
When deciding to use logical, physical deletes, or archiving I would ask myself these questions:
在决定使用逻辑、物理删除或归档时,我会问自己以下问题:
- Is this data that might need to be re-inserted into the table. For example User Accounts fit this category as you might activate or deactivate a user account. If this is the case a logical delete makes the most sense.
- Is there any intrinsic value in storing the data? If so how much data will be generated. Depending on this I would either go with a logical delete, or implement an archiving strategy. Keep in mind you can always archive logically deleted records.
- 这些数据是否可能需要重新插入表中。例如,用户帐户适合此类别,因为您可以激活或停用用户帐户。如果是这种情况,逻辑删除最有意义。
- 存储数据是否有任何内在价值?如果是这样,将生成多少数据。根据这一点,我要么采用逻辑删除,要么实施归档策略。请记住,您始终可以归档逻辑删除的记录。
回答by Tohid
It might be a little late but I suggest everyone to check Pinal Dave's blog postabout logical/soft delete:
可能有点晚了,但我建议大家查看Pinal Dave关于逻辑/软删除的博客文章:
I just do not like this kind of design [soft delete] at all. I am firm believer of the architecture where only necessary data should be in single table and the useless data should be moved to an archived table. Instead of following the isDeleted column, I suggest the usage of two different tables: one with orders and another with deleted orders. In that case, you will have to maintain both the table, but in reality, it is very easy to maintain. When you write UPDATE statement to the isDeleted column, write INSERT INTO another table and DELETE it from original table. If the situation is of rollback, write another INSERT INTO and DELETE in reverse order. If you are worried about a failed transaction, wrap this code in TRANSACTION.
What are the advantages of the smaller table verses larger table in above described situations?
- A smaller table is easy to maintain
- Index Rebuild operations are much faster
- Moving the archive data to another filegroup will reduce the load of primary filegroup (considering that all filegroups are on different system) – this will also speed up the backup as well.
- Statistics will be frequently updated due to smaller size and this will be less resource intensive.
- Size of the index will be smaller
- Performance of the table will improve with a smaller table size.
我就是不喜欢这种设计【软删除】。我坚信这样的架构,即只有必要的数据应该放在单个表中,而无用的数据应该移动到存档表中。我建议使用两个不同的表,而不是遵循 isDeleted 列:一个包含订单,另一个包含已删除的订单。在这种情况下,您将不得不维护两个表,但实际上,维护起来非常容易。当您将 UPDATE 语句写入 isDeleted 列时,将 INSERT INTO 写入另一个表并从原始表中删除它。如果是回滚的情况,按相反的顺序再写一个INSERT INTO和DELETE。如果您担心交易失败,请将此代码包装在 TRANSACTION 中。
在上述情况下,较小的表与较大的表相比有哪些优势?
- 较小的桌子易于维护
- 索引重建操作要快得多
- 将存档数据移动到另一个文件组将减少主文件组的负载(考虑到所有文件组都在不同的系统上)——这也将加快备份速度。
- 由于规模较小,统计数据将经常更新,这将减少资源密集度。
- 索引的大小会更小
- 表的性能会随着表大小的减小而提高。
回答by Mario S
I'm a NoSQL developer, and on my last job, I worked with data that was always critical for someone, and if it was deleted by accident in the same day that was created, I were not able to find it in the last backup from yesterday! In that situation, soft deletion always saved the day.
我是一名 NoSQL 开发人员,在我的上一份工作中,我处理的数据对某人来说总是很重要,如果在创建的同一天意外删除了它,我无法在上次备份中找到它从昨天!在那种情况下,软删除总是能挽救局面。
I did soft-deletion using timestamps, registering the date the document was deleted:
我使用时间戳进行软删除,记录文档被删除的日期:
IsDeleted = 20150310 //yyyyMMdd
Every Sunday, a process walked on the database and checked the IsDeleted
field. If the difference between the current date and the timestamp was greater than N days, the document was hard deleted. Considering the document still be available on some backup, it was safe to do it.
每个星期天,一个进程会在数据库上行走并检查IsDeleted
字段。如果当前日期和时间戳之间的差异大于 N 天,则该文档被硬删除。考虑到文档在某些备份上仍然可用,这样做是安全的。
EDIT:This NoSQL use case is about big documents created in the database, tens or hundreds of them every day, but not thousands or millions. By general, they were documents with the status, data and attachments of workflow processes. That was the reason why there was the possibility of a user deletes an important document. This user could be someone with Admin privileges, or maybe the document's owner, just to name a few.
编辑:这个 NoSQL 用例是关于在数据库中创建的大文档,每天有数十或数百个,但不是数千或数百万。一般来说,它们是包含工作流程状态、数据和附件的文档。这就是用户有可能删除重要文档的原因。此用户可能是具有管理员权限的人,也可能是文档的所有者,仅举几例。
TL;DR My use case was not Big Data. In that case, you will need a different approach.
TL;DR 我的用例不是大数据。在这种情况下,您将需要不同的方法。
回答by Code Warrior
One pattern I have used is to create a mirror table and attach a trigger on the primary table, so all deletes (and updates if desired) are recorded in the mirror table.
我使用的一种模式是创建一个镜像表并在主表上附加一个触发器,因此所有删除(和更新,如果需要)都记录在镜像表中。
This allows you to "reconstruct" deleted/changed records, and you can still hard delete in the primary table and keep it "clean" - it also allows the creation of an "undo" function, and you can also record the date, time, and user who did the action in the mirror table (invaluable in witch hunt situations).
这允许您“重建”删除/更改的记录,您仍然可以在主表中硬删除并保持“干净” - 它还允许创建“撤消”功能,您还可以记录日期,时间,以及在镜像表中执行操作的用户(在猎巫情况下非常宝贵)。
The other advantage is there is no chance of accidentally including deleted records when querying off the primary unless you deliberately go to the trouble of including records from the mirror table (you may want to show live and deleted records).
另一个优点是在查询主数据库时不会意外包含已删除的记录,除非您故意麻烦地包含来自镜像表的记录(您可能想要显示活动记录和已删除记录)。
Another advantage is that the mirror table can be independently purged, as it should not have any actual foreign key references, making this a relatively simple operation in comparison to purging from a primary table that uses soft deletes but still has referential connections to other tables.
另一个优点是镜像表可以独立清除,因为它不应该有任何实际的外键引用,与从使用软删除但仍然具有与其他表的引用连接的主表中清除相比,这是一个相对简单的操作。
What other advantages?- great if you have a bunch of coders working on the project, doing reads on the database with mixed skill and attention to detail levels, you don't have to stay up nights hoping that one of them didn't forget to not include deleted records (lol, Not Include Deleted Records = True), which results in things like overstating say the clients available cash position which they then go buy some shares with (i.e., as in a trading system), when you work with trading systems, you will find out very quickly the value of robust solutions, even though they may have a little bit more initial "overhead".
还有什么优势?- 如果你有一群编码员在做这个项目,用混合技能和对细节级别的关注来读取数据库,那么你不必熬夜希望他们中的一个不会忘记不包括删除记录(大声笑,不包括已删除的记录 = True),这会导致诸如夸大客户可用现金头寸之类的事情,然后他们去购买一些股票(即在交易系统中),当您使用交易系统时,您将很快发现稳健解决方案的价值,即使它们可能有更多的初始“开销”。
Exceptions:
- as a guide, use soft deletes for "reference" data such as user, category, etc, and hard deletes to a mirror table for "fact" type data, i.e., transaction history.
例外情况:
- 作为指导,对“参考”数据(如用户、类别等)使用软删除,对“事实”类型数据(即交易历史)使用硬删除到镜像表。
回答by Galwegian
I commonly use logical deletions - I find they work well when you also intermittently archive off the 'deleted' data to an archived table (which can be searched if needed) thus having no chance of affecting the performance of the application.
我通常使用逻辑删除 - 我发现它们在您间歇性地将“已删除”数据存档到存档表(如果需要可以搜索)时运行良好,因此不会影响应用程序的性能。
It works well because you still have the data if you're ever audited. If you delete it physically, it's gone!
它运作良好,因为如果您接受过审计,您仍然拥有数据。如果你物理删除它,它就消失了!
回答by Jon Dewees
I'm a big fan of the logical delete, especially for a Line of Business application, or in the context of user accounts. My reasons are simple: often times I don't want a user to be able to use the system anymore (so the account get's marked as deleted), but if we deleted the user, we'd lose all their work and such.
我非常喜欢逻辑删除,尤其是对于业务线应用程序或在用户帐户的上下文中。我的理由很简单:很多时候我不希望用户能够再使用系统(所以帐户被标记为已删除),但是如果我们删除用户,我们将丢失他们所有的工作等等。
Another common scenario is that the users might get re-created a while after having been delete. It's a much nicer experience for the user to have all their data present as it was before they were deleted, rather than have to re-create it.
另一种常见的情况是,用户可能会在被删除一段时间后重新创建。对于用户来说,让他们的所有数据保持在删除之前的状态是一种更好的体验,而不是必须重新创建它。
I usually think of deleting users more as "suspending" them indefinitely. You never know when they'll legitimately need to be back.
我通常认为删除用户更多的是无限期地“暂停”他们。你永远不知道他们什么时候需要回来。
回答by Gianluca Ghettini
I almost always soft delete and here's why:
我几乎总是软删除,原因如下:
- you can restore deleted data if a customer asks you to do so. More happy customers with soft deletes. Restoring specific data from backups is complex
- checking for
isdeleted
everywhere is not an issue, you have to check foruserid
anyway (if the database contains data from multiple users). You can enforce the check by code, by placing those two checks on a separate function (or use views) - graceful delete. Users or processes dealing with deleted content will continue to "see" it until they hit the next refresh. This is a very desirable feature if a process is processing some data which is suddenly deleted
- synchronization: if you need to design a synchronization mechanism between a database and mobile apps, you'll find soft deletes much easier to implement
- 如果客户要求您这样做,您可以恢复已删除的数据。软删除让客户更满意。从备份中恢复特定数据很复杂
- 检查
isdeleted
无处不在不是问题,userid
无论如何您都必须检查(如果数据库包含来自多个用户的数据)。您可以通过将这两个检查放在单独的函数上(或使用视图)来强制执行代码检查 - 优雅的删除。处理已删除内容的用户或进程将继续“看到”它,直到他们点击下一次刷新。如果进程正在处理一些突然删除的数据,这是一个非常理想的功能
- 同步:如果您需要设计数据库和移动应用程序之间的同步机制,您会发现软删除更容易实现
回答by David
I used to do soft-delete, just to keep old records. I realized that users don't bother to view old records as often as I thought. If users want to view old records, they can just view from archive or audit table, right? So, what's the advantage of soft-delete? It only leads to more complex query statement, etc.
我曾经做软删除,只是为了保留旧记录。我意识到用户并不像我想象的那样经常查看旧记录。如果用户想查看旧记录,他们可以从存档或审计表中查看,对吗?那么,软删除有什么好处呢?它只会导致更复杂的查询语句等。
Following are the things i've implemented, before I decided to not-soft-delete anymore:
在我决定不再软删除之前,以下是我已经实施的内容:
implement audit, to record all activities (add,edit,delete). Ensure that there's no foreign key linked to audit, and ensure this table is secured and nobody can delete except administrators.
identify which tables are considered "transactional table", which very likely that it will be kept for long time, and very likely user may want to view the past records or reports. For example; purchase transaction. This table should not just keep the id of master table (such as dept-id), but also keep the additional info such as the name as reference (such as dept-name), or any other necessary fields for reporting.
Implement "active/inactive" or "enable/disable" or "hide/show" record of master table. So, instead of deleting record, the user can disable/inactive the master record. It is much safer this way.
实施审计,记录所有活动(添加、编辑、删除)。确保没有外键链接到审计,并确保该表是安全的,除了管理员之外没有人可以删除。
确定哪些表被认为是“事务表”,哪些表很可能会被保留很长时间,并且很可能用户可能想要查看过去的记录或报告。例如; 购买交易。该表不应该只保留主表的 id(如 dept-id),还应保留其他信息,如名称作为参考(如 dept-name),或任何其他必要的报告字段。
实现主表的“活动/非活动”或“启用/禁用”或“隐藏/显示”记录。因此,用户可以禁用/停用主记录,而不是删除记录。这种方式安全得多。
Just my two cents opinion.
只是我的两分钱的意见。
回答by Ian Varley
Re: "Is this secure?" - that depends on what you mean.
回复:“这样安全吗?” - 这取决于你的意思。
If you mean that by doing physical delete, you'll prevent anyone from ever finding the deleted data, then yes, that's more or less true; you're safer in physically deleting the sensitive data that needs to be erased, because that means it's permanently gone from the database. (However, realize that there may be other copies of the data in question, such as in a backup, or the transaction log, or a recorded version from in transit, e.g. a packet sniffer - just because you delete from your database doesn't guarantee it wasn't saved somewhere else.)
如果您的意思是通过物理删除,您将阻止任何人找到已删除的数据,那么是的,这或多或少是正确的;物理删除需要删除的敏感数据会更安全,因为这意味着它已从数据库中永久消失。(但是,要意识到可能存在相关数据的其他副本,例如备份、事务日志或传输中的记录版本,例如数据包嗅探器 - 仅仅因为您从数据库中删除不会保证它没有保存在其他地方。)
If you mean that by doing logical delete, your data is more secure because you'll never lose any data, that's also true. This is good for audit scenarios; I tend to design this way because it admits the basic fact that once data is generated, it'll never reallygo away (especially if it ever had the capability of being, say, cached by an internet search engine). Of course, a real audit scenario requires that not only are deletes logical, but that updates are also logged, along with the time of the change and the actor who made the change.
如果您的意思是通过进行逻辑删除,您的数据将更加安全,因为您永远不会丢失任何数据,这也是正确的。这对审计场景很有用;我倾向于这样设计,因为它承认这样一个基本事实,即一旦数据生成,它就永远不会真正消失(特别是如果它曾经有能力被互联网搜索引擎缓存)。当然,真实的审计场景不仅要求删除是合乎逻辑的,而且还需要记录更新、更改时间和进行更改的参与者。
If you mean that the data won't fall into the hands of anyone who isn't supposed to see it, then that's totally up to your application and its security structure. In that respect, logical delete is no more or less secure than anything else in your database.
如果您的意思是数据不会落入不应该看到它的任何人的手中,那么这完全取决于您的应用程序及其安全结构。在这方面,逻辑删除并不比数据库中的其他任何东西安全。