问题描述
我最近发现并修复了我正在处理的站点中的一个错误,该错误导致表中有数百万重复的数据行,即使没有这些数据行也会非常大(仍然有数百万).我可以很容易地找到这些重复的行,并且可以运行一个删除查询来杀死它们.问题是尝试一次删除这么多行会长时间锁定表,如果可能的话,我想避免这种情况.在不关闭站点(通过锁定表)的情况下,我可以看到摆脱这些行的唯一方法是:
I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still in the millions). I can easily find these duplicate rows and can run a single delete query to kill them all. The problem is that trying to delete this many rows in one shot locks up the table for a long time, which I would like to avoid if possible. The only ways I can see to get rid of these rows, without taking down the site (by locking up the table) are:
- 编写一个脚本,该脚本将在循环中执行数千个较小的删除查询.这理论上将解决锁定表问题,因为其他查询将能够将其放入队列并在删除之间运行.但它仍然会显着增加数据库的负载,并且需要很长时间才能运行.
- 重命名表并重新创建现有表(它现在将为空).然后在重命名的表上进行清理.重命名新表,重新命名旧表并将新行合并到重命名的表中.这种方式需要更多的步骤,但应该以最少的中断完成工作.这里唯一棘手的部分是有问题的表格是一个报告表格,所以一旦它被重命名,并且空的表格放在它的位置,所有历史报告都会消失,直到我把它放回原位.另外,由于存储的数据类型,合并过程可能有点麻烦.总的来说,这是我目前可能的选择.
我只是想知道之前是否有其他人遇到过这个问题,如果有,你是如何在不关闭网站的情况下处理它的,希望对用户的干扰最小?如果我采用第 2 种方法或不同的类似方法,我可以安排这些东西在深夜运行,并在第二天早上进行合并,然后提前让用户知道,所以这没什么大不了的.我只是想看看是否有人有更好或更简单的清理方法.
I was just wondering if anyone else has had this problem before and, if so, how you dealt with it without taking down the site and, hopefully, with minimal if any interruption to the users? If I go with number 2, or a different, similar, approach, I can schedule the stuff to run late at night and do the merge early the next morning and just let the users know ahead of time, so that's not a huge deal. I'm just looking to see if anyone has any ideas for a better, or easier, way to do the cleanup.
推荐答案
DELETE FROM `table`
WHERE (whatever criteria)
ORDER BY `id`
LIMIT 1000
清洗、冲洗、重复直到受影响的行数为零.也许在一个在迭代之间休眠一三秒的脚本中.
Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.
这篇关于在 MySQL 中删除数百万行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!