SQL “UNION ALL”视图上的慢查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9031201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:10:12  来源:igfitidea点击:

Slow query on "UNION ALL" view

sqlperformancepostgresqlindexingunion-all

提问by Mladen Jablanovi?

I have a DB view which basically consists of two SELECTqueries with UNION ALL, like this:

我有一个数据库视图,它基本上由两个SELECT带有 的查询组成UNION ALL,如下所示:

CREATE VIEW v AS
SELECT time, etc. FROM t1 // #1...
UNION ALL
SELECT time, etc. FROM t2 // #2...

The problem is that selects of the form

问题是表单的选择

SELECT ... FROM v WHERE time >= ... AND time < ...

perform really really slow on it.

执行真的很慢。

Both SELECT #1 and #2 are decently fast, properly indexed and so on: when I create views v1 and v2 like:

SELECT #1 和 #2 都相当快,索引正确等等:当我创建视图 v1 和 v2 时:

CREATE VIEW v1 AS
SELECT time, etc. FROM t1 // #1...

CREATE VIEW v2 AS
SELECT time, etc. FROM t2 // #2...

And the same SELECT, with same WHERE condition as the above works OK on them individually.

和相同的 SELECT,与上述相同的 WHERE 条件单独对它们起作用。

Any ideas about where might be the problem and how to solve it?

关于问题可能出在哪里以及如何解决它的任何想法?

(Just to mention, it's one of the recent Postgres versions.)

(顺便提一下,它是最近的 Postgres 版本之一。)

Edit: Adding anonymized query plans (thaks to @filiprem for the link to an awesome tool):

编辑:添加匿名查询计划(感谢@filiprem 以获得一个很棒的工具的链接):

v1:

v1:

Aggregate  (cost=9825.510..9825.520 rows=1 width=53) (actual time=59.995..59.995 rows=1 loops=1)
  ->  Index Scan using delta on echo alpha  (cost=0.000..9815.880 rows=3850 width=53) (actual time=0.039..53.418 rows=33122 loops=1)
          Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey))
          Filter: ((NOT victor) AND ((bravo_sierra five NULL) OR ((bravo_sierra)::golf <> 'india'::golf)))

v2:

v2:

Aggregate  (cost=15.470..15.480 rows=1 width=33) (actual time=0.231..0.231 rows=1 loops=1)
  ->  Index Scan using yankee on six charlie  (cost=0.000..15.220 rows=99 width=33) (actual time=0.035..0.186 rows=140 loops=1)
          Index Cond: (("juliet" >= 'seven'::uniform bravo oscar whiskey) AND ("juliet" <= 'november'::uniform bravo oscar whiskey))
          Filter: (NOT victor)

v:

五:

Aggregate  (cost=47181.850..47181.860 rows=1 width=0) (actual time=37317.291..37317.291 rows=1 loops=1)
  ->  Append  (cost=42.170..47132.480 rows=3949 width=97) (actual time=1.277..37304.453 rows=33262 loops=1)
        ->  Nested Loop Left Join  (cost=42.170..47052.250 rows=3850 width=99) (actual time=1.275..37288.465 rows=33122 loops=1)
              ->  Hash Left Join  (cost=42.170..9910.990 rows=3850 width=115) (actual time=1.123..117.797 rows=33122 loops=1)
                      Hash Cond: ((alpha_seven.two)::golf = (quebec_three.two)::golf)
                    ->  Index Scan using delta on echo alpha_seven  (cost=0.000..9815.880 rows=3850 width=132) (actual time=0.038..77.866 rows=33122 loops=1)
                            Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey_two) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey_two))
                            Filter: ((NOT victor) AND ((bravo_sierra five NULL) OR ((bravo_sierra)::golf <> 'india'::golf)))
                    ->  Hash  (cost=30.410..30.410 rows=941 width=49) (actual time=1.068..1.068 rows=941 loops=1)
                            Buckets: 1024  Batches: 1  Memory Usage: 75kB
                          ->  Seq Scan on alpha_india quebec_three  (cost=0.000..30.410 rows=941 width=49) (actual time=0.010..0.486 rows=941 loops=1)
              ->  Index Scan using mike on hotel quebec_sierra  (cost=0.000..9.630 rows=1 width=24) (actual time=1.112..1.119 rows=1 loops=33122)
                      Index Cond: ((alpha_seven.zulu)::golf = (quebec_sierra.zulu)::golf)
        ->  Subquery Scan on "*SELECT* 2"  (cost=34.080..41.730 rows=99 width=38) (actual time=1.081..1.951 rows=140 loops=1)
              ->  Merge Right Join  (cost=34.080..40.740 rows=99 width=38) (actual time=1.080..1.872 rows=140 loops=1)
                      Merge Cond: ((quebec_three.two)::golf = (charlie.two)::golf)
                    ->  Index Scan using whiskey_golf on alpha_india quebec_three  (cost=0.000..174.220 rows=941 width=49) (actual time=0.017..0.122 rows=105 loops=1)
                    ->  Sort  (cost=18.500..18.750 rows=99 width=55) (actual time=0.915..0.952 rows=140 loops=1)
                            Sort Key: charlie.two
                            Sort Method:  quicksort  Memory: 44kB
                          ->  Index Scan using yankee on six charlie  (cost=0.000..15.220 rows=99 width=55) (actual time=0.022..0.175 rows=140 loops=1)
                                  Index Cond: (("juliet" >= 'seven'::uniform bravo_victor oscar whiskey_two) AND ("juliet" <= 'november'::uniform bravo_victor oscar whiskey_two))
                                  Filter: (NOT victor)

julietis time.

juliettime

回答by maniek

This seems to be a case of a pilot error. The "v" query plan selects from at least 5 different tables.

这似乎是飞行员错误的情况。“v”查询计划从至少 5 个不同的表中进行选择。

Now, Are You sure You are connected to the right database? Maybe there are some funky search_path settings? Maybe t1 and t2 are actually views (possibly in a different schema)? Maybe You are somehow selecting from the wrong view?

现在,您确定您已连接到正确的数据库吗?也许有一些时髦的 search_path 设置?也许 t1 和 t2 实际上是视图(可能在不同的模式中)?也许您不知何故从错误的观点中进行选择?

Edited after clarification:

澄清后编辑:

You are using a quite new feature called "join removal" : http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.0#Join_Removal

您正在使用一个名为“加入删除”的新功能:http: //wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.0#Join_Removal

http://rhaas.blogspot.com/2010/06/why-join-removal-is-cool.html

http://rhaas.blogspot.com/2010/06/why-join-removal-is-cool.html

It appears that the feature does not kick in when union all is involved. You probably have to rewrite the view using only the required two tables.

当涉及 union all 时,该功能似乎不会启动。您可能必须仅使用所需的两个表来重写视图。

another edit: You appear to be using an aggregate (like "select count(*) from v" vs. "select * from v"), which could get vastly different plans in face of join removal. I guess we won't get very far without You posting the actual queries, view and table definitions and plans used...

另一个编辑:您似乎正在使用聚合(例如“从 v 中选择计数(*)”与“从 v 中选择 *”),这在移除连接时可能会得到截然不同的计划。我想如果没有您发布实际的查询、视图和表定义以及使用的计划,我们就不会走得太远……

回答by Stephen Quan

I believe your query is being executed similar to:

我相信您的查询正在执行类似于:

(
   ( SELECT time, etc. FROM t1 // #1... )
   UNION ALL
   ( SELECT time, etc. FROM t2 // #2... )
)
WHERE time >= ... AND time < ...

which the optimizer is having difficulty optimizing. i.e. it's doing the UNION ALLfirst before applying the WHEREclause but, you wish it to apply the WHEREclause beforethe UNION ALL.

优化器难以优化。也就是说,它的执行UNION ALL应用之前首先WHERE条款,但是,你希望它适用WHERE条款之前UNION ALL

Couldn't you put your WHEREclause in the CREATE VIEW?

你不能把你的WHERE条款放在CREATE VIEW

CREATE VIEW v AS
( SELECT time, etc. FROM t1  WHERE time >= ... AND time < ... )
UNION ALL
( SELECT time, etc. FROM t2  WHERE time >= ... AND time < ... )

Alternatively if the view cannot have the WHEREclause, then, perhaps you can keep to the two views and do the UNION ALLwith the WHEREclause when you need them:

或者,如果视图不能有WHERE子句,那么,也许您可​​以保留两个视图并在需要时UNION ALL使用WHERE子句:

CREATE VIEW v1 AS
SELECT time, etc. FROM t1 // #1...

CREATE VIEW v2 AS
SELECT time, etc. FROM t2 // #2...

( SELECT * FROM v1 WHERE time >= ... AND time < ... )
UNION ALL
( SELECT * FROM v2 WHERE time >= ... AND time < ... )

回答by Dimitri

I do not know Postgres, but some RMDBs handle comparison operators worse than BETWEEN in case of indexes. I would make an attempt using BETWEEN.

我不知道 Postgres,但是在索引的情况下,一些 RMDB 处理的比较运算符比 BETWEEN 更差。我会尝试使用 BETWEEN。

SELECT ... FROM v WHERE time BETWEEN ... AND ...

回答by ErikE

Combine the two tables. Add a column to indicate original table. If necessary, replace the original table names with views that select just the relevant part. Problem solved!

合并两个表。添加一列以指示原始表。如有必要,用仅选择相关部分的视图替换原始表名。问题解决了!

Looking into the superclass/subclass db design pattern could be of use to you.

查看超类/子类数据库设计模式可能对您有用。

回答by Olivier Jacot-Descombes

A possibility would be to issue a new SQL dynamically at each call instead of creating a view and to integrate the where clause in each SELECT of the union query

一种可能性是在每次调用时动态发出一个新的 SQL,而不是创建一个视图,并在联合查询的每个 SELECT 中集成 where 子句

SELECT time, etc. FROM t1
    WHERE time >= ... AND time < ...
UNION ALL
SELECT time, etc. FROM t2
    WHERE time >= ... AND time < ...


EDIT:

编辑:

Can you use a parametrized function?

您可以使用参数化函数吗?

CREATE OR REPLACE FUNCTION CallMyView(t1 date, t2 date)
RETURNS TABLE(d date, etc.)
AS $$
    BEGIN
        RETURN QUERY
            SELECT time, etc. FROM t1
                WHERE time >= t1 AND time < t2
            UNION ALL
            SELECT time, etc. FROM t2
                WHERE time >= t1 AND time < t2;
    END;
$$ LANGUAGE plpgsql;

Call

称呼

SELECT * FROM CallMyView(..., ...);

回答by Glenn

Encountered same scenario on 11g:

在 11g 上遇到同样的场景:

Scenario 1:

场景一:

CREATE VIEW v AS
  SELECT time, etc. FROM t1 // #1...

The following query runs fast, plan looks okay:

以下查询运行速度很快,计划看起来不错:

SELECT ... FROM v WHERE time >= ... AND time < ...

Scenario 2:

场景2:

CREATE VIEW v AS
  SELECT time, etc. FROM t2 // #2...

The following query runs fast, plan looks okay:

以下查询运行速度很快,计划看起来不错:

SELECT ... FROM v WHERE time >= ... AND time < ...

Scenario 3, with UNION ALL:

场景 3,使用 UNION ALL:

CREATE VIEW v AS
  SELECT time, etc. FROM t1 // #1...
  UNION ALL
  SELECT time, etc. FROM t2 // #2...

The following runs slow. Plan breaks apart t1 and t2 (which were also views) and assembles them as a big series of unions. The time filters are being applied properly on the individual components, but it is still very slow:

以下运行缓慢。计划将 t1 和 t2(它们也是视图)分开,并将它们组装为一个大系列的联合。时间过滤器在各个组件上正确应用,但它仍然很慢:

SELECT ... FROM v WHERE time >= ... AND time < ...

I would have been happy to just get a time in the ballpark of t1 plus t2, but it was more than double. Adding the parallelhint did the trick for me in this case. It re-arranged everything into a better plan:

我本来很高兴能在 t1 加 t2 的球场上有时间,但它是两倍多。parallel在这种情况下,添加提示对我有用。它将一切重新安排成一个更好的计划:

SELECT /*+ parallel */ ... FROM v WHERE time >= ... AND time < ...

回答by Walter Mitty

Try creating your view using UNION DISTINCT instead of UNION ALL. See if it gives wrong results. See if it gives faster performance.

尝试使用 UNION DISTINCT 而不是 UNION ALL 创建您的视图。看看它是否给出了错误的结果。看看它是否提供更快的性能。

If it gives wrong results, try and map your SQL operations on tables back to relational operations on relations. The elements of relations are always distinct. There may be somthing fundamentally wrong with your model.

如果它给出错误的结果,请尝试将表上的 SQL 操作映射回关系上的关系操作。关系的要素总是不同的。您的模型可能存在根本性的错误。

I am deeply suspicious of the LEFT JOINS in the query plan you showed. It shouldn't be necessary to perform LEFT JOINS in order to get the results you appear to be selecting.

我对您显示的查询计划中的 LEFT JOINS 深表怀疑。没有必要执行 LEFT JOINS 以获得您似乎正在选择的结果。

回答by bjan

I think i don't have much points to post it as comments so i am posting it as an answer

我认为我没有太多要点可以将其作为评论发布,因此我将其发布为答案

I don't know how PostgreSQL works behind the scene, i think you may get a clue if it would have been Oracle, so it is here how Oracle would work

我不知道 PostgreSQL 在幕后是如何工作的,我想如果它是 Oracle,你可能会得到一个线索,所以这里是 Oracle 如何工作的

Your UNION ALLview is slower because, behind the scene, records from both SELECT #1and #2are combined in a temporary table first, which is created on the fly, and then your SELECT ... FROM v WHERE time >= ... AND time < ...is executed on this temporary table. Since both #1and #2are indexed so they are working faster individually as expected, but this temporary table is not indexed (of course) and the final records are being selected from this temporary table so resulting in a slower response.

您的UNION ALL视图速度较慢,因为在幕后,来自SELECT #1#2 的记录首先合并在一个临时表中,该表是动态创建的,然后是您的SELECT ... FROM v WHERE time >= 。 .. AND time < ...在这个临时表上执行。由于#1#2都被索引,所以它们按预期单独工作得更快,但是这个临时表没有被索引(当然)并且最终记录是从这个临时表中选择的,因此导致响应变慢。

Now, at least, i don't see any way to have it faster + view + non-materialized

现在,至少,我看不出有什么方法可以让它更快 + 查看 + 非物化

One way, other than running SELECT #1and #2and UNION them explicitly, to make it faster would be to use a stored procedure or a function in your application programming language (if it is the case), and in this procedure you make separate calls to each indexed table and then combine results, which is not as simple as SELECT ... FROM v WHERE time >= ... AND time < ...:(

除了显式运行SELECT #1#2和 UNION 它们之外,一种方法是使用存储过程或应用程序编程语言中的函数(如果是这种情况),并在此过程中使它们更快单独调用每个索引表然后合并结果,这不像SELECT ... FROM v WHERE time >= ... AND time < ...:(