SQL 实现软删除的最佳方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/68323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:24:04  来源:igfitidea点击:

What is the best way to implement soft deletion?

sqldatabasedatabase-designbackup

提问by Josh Smeaton

Working on a project at the moment and we have to implement soft deletion for the majority of users (user roles). We decided to add an is_deleted='0'field on each table in the database and set it to '1'if particular user roles hit a delete button on a specific record.

目前正在做一个项目,我们必须对大多数用户(用户角色)实施软删除。我们决定is_deleted='0'在数据库中的每个表上添加一个字段,并将其设置为'1'是否特定用户角色点击了特定记录上的删除按钮。

For future maintenance now, each SELECTquery will need to ensure they do not include records where is_deleted='1'.

现在为了将来的维护,每个SELECT查询都需要确保它们不包含记录where is_deleted='1'

Is there a better solution for implementing soft deletion?

是否有更好的解决方案来实现软删除?

Update: I should also note that we have an Audit database that tracks changes (field, old value, new value, time, user, ip) to all tables/fields within the Application database.

更新:我还应该注意到,我们有一个审计数据库,用于跟踪应用程序数据库中所有表/字段的更改(字段、旧值、新值、时间、用户、IP)。

回答by ctcherry

I would lean towards a deleted_atcolumn that contains the datetime of when the deletion took place. Then you get a little bit of free metadata about the deletion. For your SELECTjust get rows WHERE deleted_at IS NULL

我倾向于deleted_at包含删除发生日期时间的列。然后你会得到一些关于删除的免费元数据。因为你SELECT只是得到行WHERE deleted_at IS NULL

回答by David J. Sokol

You could perform all of your queries against a view that contains the WHERE IS_DELETED='0'clause.

您可以针对包含该WHERE IS_DELETED='0'子句的视图执行所有查询。

回答by Sergey Stadnik

Having is_deletedcolumn is a reasonably good approach. If it is in Oracle, to further increase performance I'd recommend partitioning the table by creating a list partition on is_deletedcolumn. Then deleted and non-deleted rows will physically be in different partitions, though for you it'll be transparent.

is_deleted列是一个相当不错的办法。如果它在 Oracle 中,为了进一步提高性能,我建议通过在is_deleted列上创建列表分区来对表进行分区。然后已删除和未删除的行将在物理上位于不同的分区中,但对您来说它是透明的。

As a result, if you type a query like

因此,如果您输入类似的查询

SELECT * FROM table_name WHERE is_deleted = 1

then Oracle will perform the 'partition pruning' and only look into the appropriate partition. Internally a partition is a different table, but it is transparent for you as a user: you'll be able to select across the entire table no matter if it is partitioned or not. But Oracle will be able to query ONLY the partition it needs. For example, let's assume you have 1000 rows with is_deleted = 0and 100000 rows with is_deleted = 1, and you partition the table on is_deleted. Now if you include condition

然后 Oracle 将执行“分区修剪”并且只查看适当的分区。分区在内部是一个不同的表,但它对用户来说是透明的:无论是否分区,您都可以在整个表中进行选择。但是 Oracle 将只能查询它需要的分区。例如,假设您有 1000 行is_deleted = 0和 100000 行is_deleted = 1,并且您在 上对表进行分区is_deleted。现在如果你包括条件

WHERE ... AND IS_DELETED=0

then Oracle will ONLY scan the partition with 1000 rows. If the table weren't partitioned, it would have to scan 101000 rows (both partitions).

那么 Oracle 将只扫描 1000 行的分区。如果表未分区,则必须扫描 101000 行(两个分区)。

回答by Jeremiah Peschka

The best response, sadly, depends on what you're trying to accomplish with your soft deletions and the database you are implementing this within.

遗憾的是,最佳响应取决于您尝试通过软删除完成什么以及在其中实施此操作的数据库。

In SQL Server, the best solution would be to use a deleted_on/deleted_at column with a type of SMALLDATETIME or DATETIME (depending on the necessary granularity) and to make that column nullable. In SQL Server, the row header data contains a NULL bitmask for each of the columns in the table so it's marginally faster to perform an IS NULL or IS NOT NULL than it is to check the value stored in a column.

在 SQL Server 中,最好的解决方案是使用类型为 SMALLDATETIME 或 DATETIME(取决于必要的粒度)的deleted_on/deleted_at 列并使该列可以为空。在 SQL Server 中,行标题数据包含表中每一列的 NULL 位掩码,因此执行 IS NULL 或 IS NOT NULL 比检查存储在列中的值要快一些。

If you have a large volume of data, you will want to look into partitioning your data, either through the database itself or through two separate tables (e.g. Products and ProductHistory) or through an indexed view.

如果您有大量数据,您将需要考虑通过数据库本身或通过两个单独的表(例如 Products 和 ProductHistory)或通过索引视图对数据进行分区。

I typically avoid flag fields like is_deleted, is_archive, etc because they only carry one piece of meaning. A nullable deleted_at, archived_at field provides an additional level of meaning to yourself and to whoever inherits your application. And I avoid bitmask fields like the plague since they require an understanding of how the bitmask was built in order to grasp any meaning.

我通常避免像 is_deleted、is_archive 等标志字段,因为它们只带有一个含义。可为空的 deleted_at, archived_at 字段为您自己和继承您的应用程序的任何人提供了额外的含义。我避免像瘟疫这样的位掩码字段,因为它们需要了解位掩码是如何构建的才能掌握任何含义。

回答by Jiaaro

if the table is large and performance is an issue, you can always move 'deleted' records to another table, which has additional info like time of deletion, who deleted the record, etc

如果表很大并且性能有问题,您可以随时将“已删除”的记录移动到另一个表,该表具有其他信息,例如删除时间、谁删除了记录等

that way you don't have to add another column to your primary table

这样你就不必在主表中添加另一列

回答by Daniel Fortunov

That depends on what information you need and what workflows you want to support.

这取决于您需要什么信息以及您想要支持什么工作流程。

Do you want to be able to:

您是否希望能够:

  • know what information was there (before it was deleted)?
  • know when it was deleted?
  • know who deleted it?
  • know in what capacity they were acting when they deleted it?
  • be able to un-delete the record?
  • be able to tell when it was un-deleted?
  • etc.
  • 知道有哪些信息(在删除之前)?
  • 知道什么时候删除的吗?
  • 知道谁删了吗?
  • 知道他们在删除它时以什么身份行事吗?
  • 可以取消删除记录吗?
  • 能够知道它何时被取消删除?
  • 等等。

If the record was deleted and un-deleted four times, is it sufficient for you to know that it is currently in an un-deleted state, or do you want to be able to tell what happened in the interim (including any edits between successive deletions!)?

如果该记录被删除和未删除四次,是否足以让您知道它当前处于未删除状态,或者您是否希望能够说出在此期间发生的事情(包括连续两次之间的任何编辑?删除!)?

回答by Andy Rappaport

Careful of soft-deleted records causing uniqueness constraint violations. If your DB has columns with unique constraints then be careful that the prior soft-deleted records don't prevent you from recreating the record.

小心软删除记录导致违反唯一性约束。如果您的数据库具有具有唯一约束的列,那么请注意先前软删除的记录不会阻止您重新创建记录。

Think of the cycle:

想想循环:

  1. create user (login=JOE)
  2. soft-delete (set deleted column to non-null.)
  3. (re) create user (login=JOE). ERROR. LOGIN=JOE is already taken
  1. 创建用户(登录= JOE)
  2. 软删除(将删除的列设置为非空。)
  3. (重新)创建用户(登录= JOE)。错误。LOGIN=JOE 已经被占用

Second create results in a constraint violation because login=JOE is already in the soft-deleted row.

由于 login=JOE 已在软删除行中,因此第二次创建会导致违反约束。

Some techniques: 1. Move the deleted record to a new table. 2. Make your uniqueness constraint across the login and deleted_at timestamp column

一些技巧: 1. 将删除的记录移动到新表中。2. 在 login 和 deleted_at 时间戳列中设置唯一性约束

My own opinion is +1 for moving to new table. Its take lots of discipline to maintain the *AND delete_at = NULL* across all your queries (for all of your developers)

我自己的意见是 +1 移动到新表。在您的所有查询中维护 *AND delete_at = NULL* 需要大量的纪律(对于您的所有开发人员)

回答by Brent

You will definitely have better performance if you move your deleted data to another table like Jim said, as well as having record of when it was deleted, why, and by whom.

如果您像 Jim 所说的那样将删除的数据移动到另一个表,并且记录删除的时间、原因和人,那么您肯定会获得更好的性能。

Adding where deleted=0to all your queries will slow them down significantly, and hinder the usage of any of indexes you may have on the table. Avoid having "flags" in your tables whenever possible.

添加到所有查询中会显着减慢它们的速度,并妨碍使用表中可能存在的任何索引。尽可能避免在您的表格中使用“标志”。where deleted=0

回答by Andy Irving

you don't mention what product, but SQL Server 2008 and postgresql (and others i'm sure) allow you to create filtered indexes, so you could create a covering index where is_deleted=0, mitigating some of the negatives of this particular approach.

您没有提到什么产品,但 SQL Server 2008 和 postgresql(以及我确定的其他产品)允许您创建过滤索引,因此您可以在 is_deleted=0 处创建覆盖索引,从而减轻这种特定方法的一些负面影响.

回答by Andy Irving

Something that I use on projects is a statusInd tinyint not null default 0 column using statusInd as a bitmask allows me to perform data management (delete, archive, replicate, restore, etc.). Using this in views I can then do the data distribution, publishing, etc for the consuming applications. If performance is a concern regarding views, use small fact tables to support this information, dropping the fact, drops the relation and allows for scalled deletes.

我在项目中使用的东西是 statusInd tinyint not null default 0 column 使用 statusInd 作为位掩码允许我执行数据管理(删除、存档、复制、恢复等)。在视图中使用它,然后我可以为消费应用程序进行数据分发、发布等。如果性能是与视图有关的问题,请使用小型事实表来支持此信息,删除事实,删除关系并允许调用删除。

Scales well and is data centric keeping the data footprint pretty small - key for 350gb+ dbs with realtime concerns. Using alternatives, tables, triggers has some overhead that depending on the need may or may not work for you.

扩展性好,以数据为中心,保持数据占用空间很小 - 350gb+ dbs 的关键,具有实时性。使用替代品、表、触发器有一些开销,根据需要可能适合也可能不适合您。

SOX related Audits may require more than a field to help in your case, but this may help. Enjoy

SOX 相关审计可能需要的不仅仅是一个字段来帮助您的案例,但这可能会有所帮助。享受