MySQL “活动”标志与否?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/102278/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:03:57  来源:igfitidea点击:

`active' flag or not?

sqlmysqlpostgresql

提问by Philip Reynolds

OK, so practically every database based application has to deal with "non-active" records. Either, soft-deletions or marking something as "to be ignored". I'm curious as to whether there are any radical alternatives thoughts on an `active' column (or a status column).

好的,实际上每个基于数据库的应用程序都必须处理“非活动”记录。无论是软删除还是将某些内容标记为“被忽略”。我很好奇在“活跃”列(或状态列)上是否有任何激进的替代想法。

For example, if I had a list of people

例如,如果我有一个人的名单

CREATE TABLE people (
  id       INTEGER PRIMARY KEY,
  name     VARCHAR(100),
  active   BOOLEAN,
  ...
);

That means to get a list of active people, you need to use

这意味着要获取活跃人员列表,您需要使用

SELECT * FROM people WHERE active=True;

Does anyone suggest that non active records would be moved off to a separate table and where appropiate a UNION is done to join the two?

有没有人建议将非活动记录移到单独的表中,并在适当的情况下完成 UNION 以连接两者?

Curiosity striking...

好奇心惊人...

EDIT:I should make clear, I'm coming at this from a purist perspective. I can see how data archiving might be necessary for large amounts of data, but that is not where I'm coming from. If you do a SELECT * FROM people it would make sense to me that those entries are in a sense "active"

编辑:我应该说清楚,我是从纯粹主义的角度来看这个的。我可以看到数据归档对于大量数据可能是多么必要,但这不是我的出发点。如果你做一个 SELECT * FROM people 对我来说这些条目在某种意义上是“活跃的”是有意义的

Thanks

谢谢

回答by Mike McAllister

You partition the table on the active flag, so that active records are in one partition, and inactive records are in the other partition. Then you create an active view for each table which automatically has the active filter on it. The database query engine automatically restricts the query to the partition that has the active records in it, which is much faster than even using an index on that flag.

您在活动标志上对表进行分区,以便活动记录在一个分区中,而非活动记录在另一个分区中。然后为每个表创建一个活动视图,该视图上自动具有活动过滤器。数据库查询引擎会自动将查询限制在其中包含活动记录的分区,这比在该标志上使用索引要快得多。

Here is an example of how to create a partitioned table in Oracle. Oracle doesn't have boolean column types, so I've modified your table structure for Oracle purposes.

以下是如何在 Oracle 中创建分区表的示例。Oracle 没有布尔列类型,因此我出于 Oracle 的目的修改了您的表结构。

CREATE TABLE people
(
   id       NUMBER(10),
   name     VARCHAR2(100),
   active   NUMBER(1)
)
PARTITION BY LIST(active)
(
   PARTITION active_records VALUES (0)
   PARTITION inactive_records VALUES (1)
);

If you wanted to you could put each partition in different tablespaces. You can also partition your indexes as well.

如果您愿意,可以将每个分区放在不同的表空间中。您也可以对索引进行分区。

Incidentally, this seems a repeat of thisquestion, as a newbie I need to ask, what's the procedure on dealing with unintended duplicates?

顺便说一句,这似乎是这个问题的重复,作为我需要问的新手,处理意外重复的程序是什么?

Edit:As requested in comments, provided an example for creating a partitioned table in Oracle

编辑:根据评论中的要求,提供了在 Oracle 中创建分区表的示例

回答by Kibbee

Well, to ensure that you only draw active records in most situations, you could create views that only contain the active records. That way it's much easier to not leave out the active part.

好吧,为了确保在大多数情况下只绘制活动记录,您可以创建仅包含活动记录的视图。这样就更容易不遗漏活动部分。

回答by Greg

We use an enum('ACTIVE','INACTIVE','DELETED') in most tables so we actually have a 3-way flag. I find it works well for us in different situations. Your mileage may vary.

我们在大多数表中使用 enum('ACTIVE','INACTIVE','DELETED') 所以我们实际上有一个 3 路标志。我发现它在不同情况下对我们都很有效。你的旅费可能会改变。

回答by Matthias Winkelmann

Moving inactive stuff is usually a stupid idea. It's a lot of overhead with lots of potential for bugs, everything becomes more complicated, like unarchiving the stuff etc. What do you do with related data? If you move all that, too, you have to modify every single query. If you don't move it, what advantage were you hoping to get?

移动不活跃的东西通常是一个愚蠢的想法。这是一个很大的开销,有很多潜在的错误,一切都变得更加复杂,比如取消归档等。你如何处理相关数据?如果您也移动所有这些,则必须修改每个查询。如果你不移动它,你希望得到什么好处?

That leads to the next point: WHY would you move it? A properly indexed table requires one additional lookup when the size doubles. Any performance improvement is bound to be negligible. And why would you even think about it until the distant future time when you actually have performance problems?

这就引出了下一点:为什么要移动它?当大小加倍时,正确索引的表需要进行一次额外的查找。任何性能改进都注定是微不足道的。为什么要等到遥远的将来真正遇到性能问题时才会考虑它?

回答by Arthur Thomas

I think looking at it strictly as a piece of data then the way that is shown in the original post is proper. The active flag piece of data is directly dependent upon the primary key and should be in the table.

我认为严格将其视为一条数据,那么原始帖子中显示的方式是正确的。活动标志数据直接依赖于主键,应该在表中。

That table holds data on people, irrespective of the current status of their data.

该表保存有关人员的数据,无论其数据的当前状态如何。

回答by Dave Cheney

Binary flags like this in your schema are a BAD idea. Consider the query

在您的架构中使用这样的二进制标志是一个坏主意。考虑查询

SELECT count(*) FROM users WHERE active=1

Looks simple enough. But what happens when you have a large number of users, so many that adding an index to this table would be required. Again, it looks straight forward

看起来很简单。但是当您拥有大量用户时会发生什么,如此之多以至于需要向该表添加索引。再次,它看起来很直

ALTER TABLE users ADD INDEX index_users_on_active (active)

EXCEPT!! This index is useless because the cardinality on this column is exactly two! Any database query optimiser will ignore this index because of it's low cardinality and do a table scan.

除了!!这个索引没用,因为这个列的基数正好是 2!任何数据库查询优化器都将忽略此索引,因为它的基数较低,并进行表扫描。

Before filling up your schema with helpful flags consider how you are going to access that data.

在用有用的标志填充您的架构之前,请考虑您将如何访问该数据。

https://stackoverflow.com/questions/108503/mysql-advisable-number-of-rows

https://stackoverflow.com/questions/108503/mysql-advisable-number-of-rows

回答by JosephStyons

The active flag is sort of ugly, but it is simple and works well.

活动标志有点难看,但它很简单,效果很好。

You could move them to another table as you suggested. I'd suggest looking at the percentage of active / inactive records. If you have over 20 or 30 % inactive records, then you might consider moving them elsewhere. Otherwise, it's not a big deal.

您可以按照您的建议将它们移到另一张桌子上。我建议查看活动/非活动记录的百分比。如果您有超过 20% 或 30% 的非活动记录,那么您可以考虑将它们移到其他地方。否则,这没什么大不了的。

回答by gbjbaanb

Yes, we would. We currently have the "active='T/F'" column in many of our tables, mainly to show the 'latest' row. When a new row is inserted, the previous T row is marked F to keep it for audit purposes.

是的,我们会。我们目前在许多表中都有“active='T/F'”列,主要是为了显示“最新”行。当插入新行时,前一个 T 行被标记为 F 以保留它以用于审计目的。

Now, we're moving to a 2-table approach, when a new row is inserted, the previous row is moved to an history table. This give us better performance for the majority of cases - looking at the current data.

现在,我们转向 2-table 方法,当插入新行时,前一行被移动到历史表。这在大多数情况下为我们提供了更好的性能 - 查看当前数据。

The cost is slightly more than the old method, previously you had to update and insert, now you have to insert and update (ie instead of inserting a new T row, you modify the existing row with all the new data), so the cost is just that of passing in a whole row of data instead of passing in just the changes. That's hardly going to make any effect.

成本比旧方法略多,以前你必须更新和插入,现在你必须插入和更新(即不是插入新的T行,而是用所有新数据修改现有行),因此成本只是传递整行数据而不是仅传递更改。这几乎不会产生任何影响。

The performance benefit is that your main table's index is significantly smaller, and you can optimise your tablespaces better (they won't grow quite so much!)

性能优势是主表的索引明显更小,并且可以更好地优化表空间(它们不会增长太多!)

回答by EndangeredMassa

We use active flags quite often. If your database is going to be very large, I could see the value in migrating inactive values to a separate table, though.

我们经常使用活动标志。不过,如果您的数据库将非常大,我可以看到将非活动值迁移到单独表的价值。

You would then only require a union of the tables when someone wants to see all records, active or inactive.

然后,当有人想要查看所有活动或非活动记录时,您只需要表格的联合。

回答by GEOCHET

In most cases a binary field indicating deletion is sufficient. Often there is a clean up mechanism that will remove those deleted records after a certain amount of time, so you may wish to start the schema with a deleted timestamp.

在大多数情况下,指示删除的二进制字段就足够了。通常有一种清理机制会在一定时间后删除那些已删除的记录,因此您可能希望使用已删除的时间戳启动模式。