从 MySQL 滚动删除旧行的最佳方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9472167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:17:35  来源:igfitidea点击:

What is the best way to delete old rows from MySQL on a rolling basis?

mysql

提问by Brad Dwyer

I find myself wanting to delete rows older than (x)-days on a rolling basis in a lot of applications. What is the best way to do this most efficiently on a high-traffic table?

我发现自己想在许多应用程序中滚动删除早于 (x) 天的行。在高流量表上最有效地执行此操作的最佳方法是什么?

For instance, if I have a table that stores notifications and I only want to keep these for 7 days. Or high scores that I only want to keep for 31 days.

例如,如果我有一张存储通知的表,而我只想将这些通知保留 7 天。或者我只想保留 31 天的高分。

Right now I keep a row storing the epoch time posted and run a cron job that runs once per hour and deletes them in increments like this:

现在,我保留一行存储发布的纪元时间并运行一个每小时运行一次的 cron 作业,并以如下增量删除它们:

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100

I do that until mysql_affected_rows returns 0.

我这样做直到 mysql_affected_rows 返回 0。

I used to do it all at once but that caused everything in the application to hang for 30 seconds or so while INSERTS piled up. Adding the LIMIT worked to alleviate this but I'm wondering if there is a better way to do this.

我曾经一次完成所有操作,但这导致应用程序中的所有内容在 INSERTS 堆积时挂起 30 秒左右。添加 LIMIT 可以缓解这种情况,但我想知道是否有更好的方法来做到这一点。

采纳答案by The Nail

Check out MySQL Partitioning:

查看MySQL 分区

Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.

通过删除仅包含该数据的分区(或多个分区),通常可以轻松地从分区表中删除失去其用处的数据。相反,在某些情况下,通过添加一个或多个新分区来专门存储该数据,可以极大地促进添加新数据的过程。

See e.g. this section to get some ideas on how to apply it:

例如,请参阅本节以获取有关如何应用它的一些想法:

MySQL Partition Pruning

MySQL 分区修剪

And this one:

和这个:

Partitioning by dates: the quick how-to

按日期分区:快速操作方法

回答by Akshay

Try creating Event that will run on database automatically after the time interval you want.

尝试创建将在您想要的时间间隔后自动在数据库上运行的事件。

Here is an Example: If you want to delete entries that are more than 30 days old from some table 'tableName', having column entry 'datetime'. Then following query runs every day which will do required clean-up action.

这是一个示例:如果您想从某个表“tableName”中删除超过 30 天的条目,该表的列条目为“datetime”。然后每天运行以下查询,这将执行所需的清理操作。

CREATE EVENT AutoDeleteOldNotifications
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 DAY 
ON COMPLETION PRESERVE
DO 
DELETE LOW_PRIORITY FROM databaseName.tableName WHERE datetime < DATE_SUB(NOW(), INTERVAL 30 DAY)

We need to add ON COMPLETION PRESERVEto keep the event after each run. You can find more info here: http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

我们需要ON COMPLETION PRESERVE在每次运行后添加以保持事件。您可以在此处找到更多信息:http: //www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

回答by RolandoMySQLDBA

Instead of executing the delete against the table alone, try gathering the matching keys first and then do a DELETE JOIN

不要单独对表执行删除,尝试先收集匹配的键,然后执行 DELETE JOIN

Given you sample query above

鉴于您上面的示例查询

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100 ;

You can leave the LIMIT out of it.

您可以将 LIMIT 排除在外。

Let say you want to delete data that over 31 days old.

假设您要删除超过 31 天的数据。

Let's compute 31 days in seconds (86400 X 31 = 2678400)

让我们以秒为单位计算 31 天 (86400 X 31 = 2678400)

  • Start with key gathering
  • Next, index the keys
  • Then, perform DELETE JOIN
  • Finally, drop the gathered keys
  • 从收集钥匙开始
  • 接下来,索引键
  • 然后,执行 DELETE JOIN
  • 最后,丢弃收集的密钥

Here is the algorithm

这是算法

CREATE TABLE delete_keys SELECT id FROM my_table WHERE 1=2;
INSERT INTO delete_keys
SELECT id FROM
(
    SELECT id FROM my_table
    WHERE time_stored < (UNIX_TIMESTAMP() - 2678400)
    ORDER BY time_stored
) A LIMIT 100;
ALTER TABLE delete_keys ADD PRIMARY KEY (id);
DELETE B.* FROM delete_keys
INNER JOIN my_table B USING (id);
DROP TABLE delete_keys;

If the key gathering is less than 5 minutes, then run this query every 5 minutes.

如果密钥收集时间少于 5 分钟,则每 5 分钟运行一次此查询。

Give it a Try !!!

试一试 !!!

UPDATE 2012-02-27 16:55 EDT

更新 2012-02-27 16:55 EDT

Here is something that should speed up key gathering a little more. Add the following index:

这里有一些应该加快关键收集速度的东西。添加以下索引:

ALTER TABLE my_table ADD INDEX time_stored_id_ndx (time_stored,id);

This will better support the subquery that populates the delete_keys table because this provides a covering index so that the fields are retrieved frok the index only.

这将更好地支持填充 delete_keys 表的子查询,因为它提供了一个覆盖索引,以便仅从索引中检索字段。

UPDATE 2012-02-27 16:59 EDT

更新 2012-02-27 16:59 EDT

Since you have to delete often, you may want to try this every two months

由于你要经常删除,你可能想每两个月试试这个

OPTIMIZE TABLE my_table;

This will defrag the table after all those annoying little deletes every 5 minutes for two months

这将在两个月内每 5 分钟进行一次令人讨厌的小删除后整理表格

回答by Mike Purcell

You may want to consider introducing a master/slave (replication)solution into your design. If you shift all the read traffic to the slave, you open up the master to handle 'on-the-fly' CRUD activities, which then replicate down to the slave (your read server).

您可能需要考虑在您的设计中引入主/从(复制)解决方案。如果您将所有读取流量转移到从服务器,则您打开主服务器来处理“动态”CRUD 活动,然后向下复制到从服务器(您的读取服务器)。

And because you are deleting so many records you may want to consider running an optimizeon the table(s) from where the rows are being deleted.

并且因为您要删除如此多的记录,您可能需要考虑对要删除行的表运行优化

回答by Jericon

At my company, we have a similar situation. We have a table that contains keys that have an expiration. We have a cron that runs to clean that out:

在我公司,我们也有类似的情况。我们有一个包含过期键的表。我们有一个运行来清理它的 cron:

DELETE FROM t1 WHERE expiration < UNIXTIME(NOW());

This ran once an hour, but we were having similar issues to what you are experiencing. We increased it to once per minute. Then 6 times per minute. Setup a cron with a bash script that basically does the query, then sleeps for a few seconds and repeats until the minute is up.

这每小时运行一次,但我们遇到了与您遇到的类似的问题。我们将其增加到每分钟一次。然后每分钟6次。使用 bash 脚本设置一个 cron,该脚本基本上执行查询,然后休眠几秒钟并重复直到分钟结束。

The increased frequency significantly decreased the number of rows that we were deleting. Which relieved the contention. This is the route that I would go.

增加的频率显着减少了我们删除的行数。这缓解了争论。这是我要走的路线。

However, if you find that you still have too many rows to delete, use the limit and do a sleep between them. For example, if you have 50k rows to delete, do a 10k chunk with a 2 second sleep between them. This will help the queries from stacking up, and it will allow the server to perform some normal operations between these bulk deletes.

但是,如果您发现仍有太多行要删除,请使用限制并在它们之间进行休眠。例如,如果您有 50k 行要删除,请执行 10k 块,它们之间有 2 秒的睡眠时间。这将有助于查询堆积,并允许服务器在这些批量删除之间执行一些正常操作。

回答by Robert Sinclair

Ended up using this to leave only 100 last rows in place, so significant lag when executed frequently (every minute)

最终使用它只留下最后 100 行,因此在频繁执行时(每分钟)会出现明显的滞后

delete a from tbl a left join (
    select ID
    from tbl
    order by id desc limit 100
) b on a.ID = b.ID
where b.ID is null;