MySQL MySQL长查询进度监控
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5464931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL Long Query Progress Monitoring
提问by KOGI
Just to preface my question, I understand that there is no direct support for something like this. What I am looking for is any sort of work-around, or convoluted derivation that would get me a half-respectable result.
只是作为我的问题的序言,我知道没有对这样的事情的直接支持。我正在寻找的是任何类型的变通方法或复杂的推导,它们会给我一个半可敬的结果。
I am working with a rather large MySQL cluster (tables > 400 million rows) using the cluster engine.
我正在使用集群引擎处理一个相当大的 MySQL 集群(表 > 4 亿行)。
Is anyone aware of a way to either directly retrieveor otherwise derive a somewhat (or better) accurate indication of progress through a long query in mysql? I have some queries that can take up to 45 minutes, and I need to determine if we're 10% or 90% through the processing.
有没有人知道通过mysql中的长查询直接检索或以其他方式获得某种(或更好)准确的进度指示的方法?我有一些查询可能需要长达 45 分钟,我需要通过处理确定我们是 10% 还是 90%。
EDIT:
编辑:
As requested in the comments here is a distilled and generified version of oneof the queries that is leading to my original question...
正如这里的评论中所要求的,是导致我的原始问题的其中一个查询的提炼和通用版本......
SELECT `userId`
FROM `openEndedResponses` AS `oe`
WHERE
`oe`.`questionId` = 3 -- zip code
AND (REPLACE( REPLACE( `oe`.`value`, ' ', '' ), '-', '' ) IN ( '30071', '30106', '30122', '30134', '30135', '30168', '30180', '30185', '30187', '30317', '30004' ));
This query is run against a single table with ~95 million rows. It takes 8 seconds to run the query and another 13 to transfer the data (21 sec total). Considering the size of the table, and the fact that there are string manipulation functions being used, I'd say it's running pretty damn fast. However, to the user, it's still 21 seconds appearing either stuck or idle. Some indication of progress would be ideal.
此查询针对具有约 9500 万行的单个表运行。运行查询需要 8 秒,传输数据需要另外 13 秒(总共 21 秒)。考虑到表的大小,以及正在使用字符串操作函数的事实,我会说它运行得非常快。但是,对于用户来说,仍然有 21 秒出现卡住或空闲。一些进展的迹象将是理想的。
采纳答案by KOGI
For now -- for my very specific situation -- there seems to be no real solution for this. Since I can't split my query into several smaller ones and it's proving counterproductive to select count(*)
first, and then running the "real" query (doubles execution time of an already painfully slow query), none of the workarounds seem viable either. Maybe soon, MySQL will support something like this
目前——对于我非常具体的情况——似乎没有真正的解决方案。由于我无法将我的查询拆分为几个较小的查询,并且证明select count(*)
首先会适得其反,然后运行“真正的”查询(使已经非常缓慢的查询的执行时间加倍),因此所有解决方法似乎都不可行。也许很快,MySQL 会支持这样的东西
回答by lightsurge
I know this is an old question, but I was looking for a similar answer, when trying to figure out how much longer my update would take on a query of 250m rows.
我知道这是一个老问题,但我一直在寻找类似的答案,试图找出我的更新在 250m 行的查询上需要多长时间。
If you run:
如果你运行:
SHOW ENGINE INNODB STATUS \G
Then under TRANSACTIONS find the transaction in question, examine this section:
然后在 TRANSACTIONS 下找到有问题的交易,检查此部分:
---TRANSACTION 34282360, ACTIVE 71195 sec starting index read
mysql tables in use 2, locked 2
1985355 lock struct(s), heap size 203333840, 255691088 row lock(s), undo log entries 21355084
The important bit is "undo log entries". For each updated row, in my case it seemed to add an undo log entry (trying running it again after a few seconds and see how many have been added).
重要的一点是“撤消日志条目”。对于每个更新的行,在我的情况下,它似乎添加了一个撤消日志条目(几秒钟后再次尝试运行它,看看添加了多少)。
If you skip to the end of the status report, you'll see this:
如果您跳到状态报告的末尾,您将看到:
Number of rows inserted 606188224, updated 251615579, deleted 1667, read 54873415652
0.00 inserts/s, 1595.44 updates/s, 0.00 deletes/s, 3190.88 reads/s
Here we can see that the speed updates are being applied is 1595.44 rows per second (although if you're running other update queries in tandem, then this speed might be separated between your queries).
在这里我们可以看到正在应用的更新速度为每秒 1595.44 行(尽管如果您同时运行其他更新查询,则此速度可能会在您的查询之间分开)。
So from this, I know 21m have been updated with (250m-21m) 229m rows left to go.
所以从这里,我知道 21m 已经更新了 (250m-21m) 229m 行。
229,000,000 / 1600 = 143,125 seconds to go (143,125 / 60) / 60 = 39.76 hours to go
229,000,000 / 1600 = 143,125 秒 (143,125 / 60) / 60 = 39.76 小时
So it would appear I can twiddle my thumbs for another couple of days. Unless this answer is wrong, in which case I'll update it sometime before then!
所以看起来我可以再玩几天。除非这个答案是错误的,否则我会在那之前的某个时间更新它!
回答by Dour High Arch
I was able to estimate something like this by querying the number of rows to process then breaking the processing into a loop, working on only a subset of the total rows at a time.
我能够通过查询要处理的行数然后将处理分解为一个循环来估计这样的事情,一次只处理总行的一个子集。
The full loop was rather involved, but the basic logic went like:
整个循环相当复杂,但基本逻辑如下:
SELECT @minID = Min(keyColumn) FROM table WHERE condition
SELECT @maxID = Max(keyColumn) FROM table WHERE condition
SELECT @potentialRows = (@maxID - @minID) / @iterations
WHILE @minID < @maxID
BEGIN
SET @breakID = @minID + @potentialRows
SELECT columns FROM table WITH (NOLOCK, ...)
WHERE condition AND keyColumn BETWEEN @minID AND @breakID
SET @minID = @breakID + 1
END
Note this works best if IDs are evenly distributed.
请注意,如果 ID 分布均匀,则效果最佳。
回答by NickSoft
I don't think that mysql supportsI'm sure MySQL doesn't support any indication about the progress of the running queries. The only solution is to optimize/split queries.
Select could be split by id as Dour High Arch suggested. Here is a query from 33 milion row table:
我不认为 mysql 支持我确定 MySQL 不支持有关正在运行的查询进度的任何指示。唯一的解决方案是优化/拆分查询。选择可以按 Dour High Arch 建议的 id 拆分。这是来自 3300 万行表的查询:
mysql> SELECT SQL_NO_CACHE min(id), max(id) FROM `urls`;
+---------+----------+
| min(id) | max(id) |
+---------+----------+
| 5000 | 35469678 |
+---------+----------+
1 row in set (0.00 sec)
You better use integer ot at least date field for splitting. It should be primaryor uniqueindex and should not allow null values.
您最好使用整数或至少日期字段进行拆分。它应该是主索引或唯一索引,并且不应允许空值。
回答by penguin359
If it's a complex query you are attempting, the EXPLAIN SQL command or MySQL Query Analyzer might help to understand what is going on. If it's simply a large query, you might try creating a temporary table with SELECT INTO and/or using LIMIT/OFFSET clauses in SELECT queries. If you use LIMIT/OFFSET on the original tables, you might need to set the transaction level to serializable, IIRC, so that you get consistent reads while iterating over the data. If you create a temporary table first, that table should stay consistent regardless.
如果您正在尝试进行复杂的查询,EXPLAIN SQL 命令或 MySQL 查询分析器可能有助于了解正在发生的事情。如果它只是一个大型查询,您可以尝试使用 SELECT INTO 和/或在 SELECT 查询中使用 LIMIT/OFFSET 子句创建临时表。如果对原始表使用 LIMIT/OFFSET,则可能需要将事务级别设置为可序列化,IIRC,以便在迭代数据时获得一致的读取。如果您先创建一个临时表,该表应该保持一致。
回答by Diomidis Spinellis
If your query involves a linear scan through a large table, you can often obtain an excellent estimate by running pmonitoron the file containing that table. Include the --update
option, because MySQLopens table files in update mode.
如果您的查询涉及对大型表的线性扫描,您通常可以通过对包含该表的文件运行pmonitor来获得极好的估计。包括该--update
选项,因为MySQL在更新模式下打开表文件。
Example:
例子:
$ sudo pmonitor --update --file=/home/mysql/ghtorrent/commits.MYD --interval=5 /home/mysql/ghtorrent/commits.MYD 31.66% /home/mysql/ghtorrent/commits.MYD 33.16% ETA 0:03:42 /home/mysql/ghtorrent/commits.MYD 34.85% ETA 0:03:24 /home/mysql/ghtorrent/commits.MYD 36.43% ETA 0:03:32 /home/mysql/ghtorrent/commits.MYD 38.36% ETA 0:03:12 /home/mysql/ghtorrent/commits.MYD 40.21% ETA 0:03:01 /home/mysql/ghtorrent/commits.MYD 41.95% ETA 0:02:54 [...] /home/mysql/ghtorrent/commits.MYD 92.01% ETA 0:00:24 /home/mysql/ghtorrent/commits.MYD 93.85% ETA 0:00:18 /home/mysql/ghtorrent/commits.MYD 95.76% ETA 0:00:12 /home/mysql/ghtorrent/commits.MYD 97.60% ETA 0:00:07 /home/mysql/ghtorrent/commits.MYD 98.83% ETA 0:00:03 /home/mysql/ghtorrent/commits.MYD 100% ETA 0:00:00
If you don't know the file to monitor, run pmonitorwith the --diff option. This will show you the file(s) where progress is made.
如果您不知道要监视的文件,请使用 --diff 选项运行pmonitor。这将向您显示取得进展的文件。
Example
例子
$ sudo pmonitor --update -diff --command=mysqld -i 60 [...] /home/mysql/ghtorrent/projects.MYD 22.41% ETA 2:01:41 /home/mysql/ghtorrent/projects.MYD 23.13% ETA 1:53:23 /home/mysql/ghtorrent/projects.MYD 23.84% ETA 1:50:27
回答by Nico Andrade
Here's what you'll need to do to improve the following query:
您需要执行以下操作来改进以下查询:
SELECT `userId`
FROM `openEndedResponses` AS `oe`
WHERE
`oe`.`questionId` = 3 -- zip code
AND (REPLACE( REPLACE( `oe`.`value`, ' ', '' ), '-', '' ) IN ( '30071', '30106', '30122', '30134', '30135', '30168', '30180', '30185', '30187', '30317', '30004' ));
You'll need to ensure oe.questionId is indexed; You'll need to ensure oe.value does not have any space across the entire table when oe.questionId is 3; assuming that 4 or 5 can be, let's say, city names, where you still want to allow spaces.
您需要确保 oe.questionId 已编入索引;当 oe.questionId 为 3 时,您需要确保 oe.value 在整个表中没有任何空间;假设 4 或 5 可以是,比方说,城市名称,您仍然希望允许空格。
By doing this, you'll be able to remove all the REPLACEs , which will let MySQL use an index in oe.value.
通过这样做,您将能够删除所有 REPLACEs ,这将使 MySQL 使用 oe.value 中的索引。
MySQL will then merge both indices and give you the result much faster, in terms of processing.
MySQL 将合并两个索引,并在处理方面更快地为您提供结果。
In the case you have many repeated userId; you'll want to group them; in such a way that entries from the index are immediately discarded. You still need to scan the whole merged-index; but the size of the resultset will take less time to be transferred; much less than 13 seconds!
如果您有很多重复的用户 ID;你会想要将它们分组;以这种方式立即丢弃索引中的条目。您仍然需要扫描整个合并索引;但是结果集的大小将花费更少的时间来传输;不到13秒!
Give it a shot and keep us posted about the result
试一试,让我们随时了解结果
Best!
最好的事物!
回答by pgr
There is a promising answer to this old question which I found here, written by Baron Schwartz. It's not a precise and complete solution, but it does provide some objective material for estimates, if you're only running that query and nothing else on your server.
我在这里找到了这个老问题的有希望的答案,由 Baron Schwartz 撰写。这不是一个精确和完整的解决方案,但它确实为估计提供了一些客观材料,如果您只运行该查询而您的服务器上没有其他任何内容。
You run this command while the query is already running:
在查询已经运行时运行此命令:
mysqladmin extended -r -i 10 | grep Handler
- that
10
is the number of seconds after which the command repeats itself, so wait for the refreshes - add something like
-u root -p
if you need to authenticate - if you know exaclty which Handler you're looking for, you can make the
grep
more focused, for exampleHandler_read_rnd_next
seems to be good forSELECT
's - ignore the first output, use the second and following
- use
Ctrl-C
to exit
- 这
10
是命令重复的秒数,所以等待刷新 - 添加类似的内容,
-u root -p
如果您需要进行身份验证 - 如果您知道要查找的具体 Handler,则可以使 Handler
grep
更加集中,例如Handler_read_rnd_next
似乎对SELECT
's有好处 - 忽略第一个输出,使用第二个及以下
- 用于
Ctrl-C
退出
Now get that number and do your math. Determine rows handled per second, and with your knowledge of table sizes you might be able to get a fairly precise estimate of total time.
现在得到那个数字并做你的数学运算。确定每秒处理的行数,并且根据您对表大小的了解,您可能能够对总时间进行相当精确的估计。
Free extra tip: the command doesn't seem to go into Bash history (maybe because of the exiting with Ctrl-C
, you can add it there by hand with history -s mysqladmin extended -r -i 10 -u root -p | grep Handler
免费的额外提示:该命令似乎没有进入 Bash 历史记录(可能是因为退出了 with Ctrl-C
,您可以手动将其添加到那里history -s mysqladmin extended -r -i 10 -u root -p | grep Handler
回答by Christian
How about looking into partitioning your mysql table so you can spread the read/write load. Look at trying to limit each partition to 50 Million rows (obviously dependent on your hardware)
如何查看对 mysql 表进行分区,以便可以分散读/写负载。尝试将每个分区限制为 5000 万行(显然取决于您的硬件)