MySQL - SELECT WHERE field IN(子查询) - 为什么非常慢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6135376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 20:02:25  来源:igfitidea点击:

MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

mysqlsubquerywhere-in

提问by quano

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:

我在要检查的数据库中有几个重复项,所以我做了什么来查看哪些是重复项,我这样做了:

SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1

This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.

这样,我将获得相关字段出现不止一次的所有行。此查询需要几毫秒来执行。

Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:

现在,我想检查每个重复项,所以我想我可以在上面的查询中使用相关字段选择 some_table 中的每一行,所以我这样做了:

SELECT *
FROM some_table 
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)

This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.

由于某种原因,这结果非常缓慢(需要几分钟)。这里到底发生了什么让它变得这么慢?related_field 已编入索引。

Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:

最终,我尝试从第一个查询创建一个视图“temp_view” (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1),然后像这样创建我的第二个查询:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM temp_view
)

And that works just fine. MySQL does this in some milliseconds.

这工作得很好。MySQL 会在几毫秒内完成此操作。

Any SQL experts here who can explain what's going on?

这里有任何 SQL 专家可以解释发生了什么吗?

采纳答案by quano

The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:

正在为每一行运行子查询,因为它是一个相关查询。通过从子查询中选择所有内容,可以将相关查询转换为非相关查询,如下所示:

SELECT * FROM
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
) AS subquery

The final query would look like this:

最终查询将如下所示:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT * FROM
    (
        SELECT relevant_field
        FROM some_table
        GROUP BY relevant_field
        HAVING COUNT(*) > 1
    ) AS subquery
)

回答by Johan

Rewrite the query into this

将查询改写成这样

SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id  /* list a unique sometable field here*/
HAVING COUNT(*) > 1

I think st2.relevant_fieldmust be in the select, because otherwise the havingclause will give an error, but I'm not 100% sure

我认为st2.relevant_field必须在选择中,否则having子句会出错,但我不是 100% 确定

Never use INwith a subquery; this is notoriously slow.
Only ever use INwith a fixed list of values.

切勿IN与子查询一起使用;这是出了名的慢。
只使用IN固定的值列表。

More tips

更多提示

  1. If you want to make queries faster, don't do a SELECT *only select the fields that you really need.
  2. Make sure you have an index on relevant_fieldto speed up the equi-join.
  3. Make sure to group byon the primary key.
  4. If you are on InnoDB andyou only select indexed fields (and things are not too complex)than MySQL will resolve your query using only the indexes, speeding things way up.
  1. 如果您想更快地进行查询,请不要SELECT *只选择您真正需要的字段。
  2. 确保你有一个索引relevant_field来加速等值连接。
  3. 确保group by在主键上。
  4. 如果您在 InnoDB 上并且您只选择索引字段(并且事情不是太复杂),那么 MySQL 将仅使用索引来解析您的查询,从而加快速度。

General solution for 90% of your IN (selectqueries

90%IN (select查询的通用解决方案

Use this code

使用此代码

SELECT * FROM sometable a WHERE EXISTS (
  SELECT 1 FROM sometable b
  WHERE a.relevant_field = b.relevant_field
  GROUP BY b.relevant_field
  HAVING count(*) > 1) 

回答by ceteras

SELECT st1.*
FROM some_table st1
inner join 
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;

I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.

我已经在我的一个数据库上尝试过您的查询,还尝试将其重写为子查询的连接。

This worked a lot faster, try it!

这工作得更快,试试吧!

回答by user2244323

Try this

尝试这个

SELECT t1.*
FROM 
 some_table t1,
  (SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT (*) > 1) t2
WHERE
 t1.relevant_field = t2.relevant_field;

回答by plang

I have reformatted your slow sql query with www.prettysql.net

我已经用 www.prettysql.net 重新格式化了你的慢 sql 查询

SELECT *
FROM some_table
WHERE
 relevant_field in
 (
  SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT ( * ) > 1
 );

When using a table in both the query and the subquery, you should always alias both, like this:

在查询和子查询中同时使用表时,您应该始终为两者设置别名,如下所示:

SELECT *
FROM some_table as t1
WHERE
 t1.relevant_field in
 (
  SELECT t2.relevant_field
  FROM some_table as t2
  GROUP BY t2.relevant_field
  HAVING COUNT ( t2.relevant_field ) > 1
 );

Does that help?

这有帮助吗?

回答by harun ugur

Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;

首先,您可以找到重复的行,并找到使用了多少次的行数并按这样的数字对其进行排序;

SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
  CASE q.NID
  WHEN @curCode THEN
   @curRow := @curRow + 1
  ELSE
   @curRow := 1
  AND @curCode := q.NID
  END
 ) AS No
FROM UserInfo q,
(
  SELECT
   @curRow := 1,
   @curCode := ''
 ) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 

after that create a table and insert result to it.

之后创建一个表并将结果插入其中。

create table CopyTable 
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
  CASE q.NID
  WHEN @curCode THEN
   @curRow := @curRow + 1
  ELSE
   @curRow := 1
  AND @curCode := q.NID
  END
 ) AS No
FROM UserInfo q,
(
  SELECT
   @curRow := 1,
   @curCode := ''
 ) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 

Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.

最后,删除重复行。No 是从 0 开始。除每组的第一个编号外,删除所有重复行。

delete from  CopyTable where No!= 0;

回答by Andrey Posudevsky

sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.

有时,当数据变得更大时,mysql WHERE IN 可能会因为查询优化而变得非常慢。尝试使用 STRAIGHT_JOIN 告诉 mysql 按原样执行查询,例如

SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)

but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem

但要注意:在大多数情况下 mysql 优化器工作得很好,所以我建议只有当你遇到这种问题时才使用它

回答by Hilarius L. Doren

This is similar to my case, where I have a table named tabel_buku_besar. What I need are

这类似于我的情况,我有一个名为tabel_buku_besar. 我需要的是

  1. Looking for record that have account_code='101.100'in tabel_buku_besarwhich have companyarea='20000'and also have IDRas currency

  2. I need to get all record from tabel_buku_besarwhich have account_code same as step 1 but have transaction_numberin step 1 result

  1. 寻找记录有account_code='101.100'tabel_buku_besarcompanyarea='20000',也有IDR作为currency

  2. 我需要获取所有记录,tabel_buku_besar其中 account_code 与步骤 1 相同,但transaction_number在步骤 1 结果中

while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...

使用时select ... from...where....transaction_number in (select transaction_number from ....),我的查询运行速度非常慢,有时会导致请求超时或使我的应用程序无响应...

I try this combination and the result...not bad...

我尝试了这种组合,结果……不错……

`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
      L.TRANSACTION_NUMBER AS VOUCHER,
      L.ACCOUNT_CODE,
      C.DESCRIPTION,
      L.DEBET,
      L.KREDIT 
 from (select * from tabel_buku_besar A
                where A.COMPANYAREA='$COMPANYAREA'
                      AND A.CURRENCY='$Currency'
                      AND A.ACCOUNT_CODE!='$ACCOUNT'
                      AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L 
INNER JOIN (select * from tabel_buku_besar A
                     where A.COMPANYAREA='$COMPANYAREA'
                           AND A.CURRENCY='$Currency'
                           AND A.ACCOUNT_CODE='$ACCOUNT'
                           AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA 
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA 
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`

回答by Matt

I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);

我发现这是查找值是否存在的最有效方法,可以轻松反转逻辑以查找值是否不存在(即 IS NULL);

SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL

*Replace relevant_field with the name of the value that you want to check exists in your table

*用您要检查的值的名称替换相关字段是否存在于您的表中

*Replace primaryKey with the name of the primary key column on the comparison table.

*将primaryKey替换为比较表上主键列的名称。