使用嵌套循环提高 SQL 查询的性能 - PostgreSQL

Question

提问by Dabeliuteef

I am using PostgreSQLand I have a weird problem with my SQL query. Depending on wich date paramter I'm using. My request doesn't do the same operation.

我正在使用PostgreSQL，但我的 SQL 查询有一个奇怪的问题。取决于我使用的日期参数。我的请求没有执行相同的操作。

This is my working query :

这是我的工作查询：

SELECT DISTINCT app.id_application 
FROM stat sj
LEFT OUTER JOIN groupe gp ON gp.id_groupe = sj.id_groupe 
LEFT OUTER JOIN application app ON app.id_application = gp.id_application 
WHERE date_stat >= '2016/3/01' 
AND date_stat <= '2016/3/31' 
AND ( date_stat = date_gen-1 or (date_gen = '2016/04/01' AND date_stat = '2016/3/31')) 
AND app.id_application IS NOT NULL

This query takes around 2 secondes (which is OKAY for me because I have a lots of rows). When I run EXPLAIN ANALYSE for this query I have this:

这个查询需要大约 2 秒（这对我来说没问题，因为我有很多行）。当我为这个查询运行 EXPLAIN ANALYZE 时，我有这个：

HashAggregate  (cost=375486.95..375493.62 rows=667 width=4) (actual time=2320.541..2320.656 rows=442 loops=1)
    ->  Hash Join  (cost=254.02..375478.99 rows=3186 width=4) (actual time=6.144..2271.984 rows=263274 loops=1)
    Hash Cond: (gp.id_application = app.id_application)
    ->  Hash Join  (cost=234.01..375415.17 rows=3186 width=4) (actual time=5.926..2200.671 rows=263274 loops=1)
          Hash Cond: (sj.id_groupe = gp.id_groupe)
          ->  Seq Scan on stat sj  (cost=0.00..375109.47 rows=3186 width=8) (actual time=3.196..2068.357 rows=263274 loops=1)
                Filter: ((date_stat >= '2016-03-01'::date) AND (date_stat <= '2016-03-31'::date) AND ((date_stat = (date_gen - 1)) OR ((date_gen = '2016-04-01'::date) AND (date_stat = '2016-03-31'::date))))
                Rows Removed by Filter: 7199514
          ->  Hash  (cost=133.45..133.45 rows=8045 width=12) (actual time=2.677..2.677 rows=8019 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 345kB
                ->  Seq Scan on groupe gp  (cost=0.00..133.45 rows=8045 width=12) (actual time=0.007..1.284 rows=8019 loops=1)
    ->  Hash  (cost=11.67..11.67 rows=667 width=4) (actual time=0.206..0.206 rows=692 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 25kB
          ->  Seq Scan on application app  (cost=0.00..11.67 rows=667 width=4) (actual time=0.007..0.101 rows=692 loops=1)
                Filter: (id_application IS NOT NULL)
    Total runtime: 2320.855 ms

Now, When I'm trying the same query for the current month (we are the 6th of April, so I'm trying to get all the application_id of April) with the same query

现在，当我为当月尝试相同的查询时（我们是 4 月 6 日，所以我试图获取所有四月的 application_id）使用相同的查询

SELECT DISTINCT app.id_application 
FROM stat sj
LEFT OUTER JOIN groupe gp ON gp.id_groupe = sj.id_groupe 
LEFT OUTER JOIN application app ON app.id_application = gp.id_application 
WHERE date_stat >= '2016/04/01' 
AND date_stat <= '2016/04/30' 
AND ( date_stat = date_gen-1 or ( date_gen = '2016/05/01' AND date_job = '2016/04/30')) 
AND app.id_application IS NOT NULL

This query takes now 120 seconds. So I also ran EXPLAIN ANALYZE on this query and now it doesn't have the same operations:

此查询现在需要 120 秒。所以我也在这个查询上运行了 EXPLAIN ANALYZE ，现在它没有相同的操作：

HashAggregate  (cost=375363.50..375363.51 rows=1 width=4) (actual time=186716.468..186716.532 rows=490 loops=1)
->  Nested Loop  (cost=0.00..375363.49 rows=1 width=4) (actual time=1.945..186619.404 rows=118990 loops=1)
    Join Filter: (gp.id_application = app.id_application)
    Rows Removed by Join Filter: 82222090
    ->  Nested Loop  (cost=0.00..375343.49 rows=1 width=4) (actual time=1.821..171458.237 rows=118990 loops=1)
          Join Filter: (sj.id_groupe = gp.id_groupe)
          Rows Removed by Join Filter: 954061820
          ->  Seq Scan on stat sj  (cost=0.00..375109.47 rows=1 width=8) (actual time=0.235..1964.423 rows=118990 loops=1)
                Filter: ((date_stat >= '2016-04-01'::date) AND (date_stat <= '2016-04-30'::date) AND ((date_stat = (date_gen - 1)) OR ((date_gen = '2016-05-01'::date) AND (date_stat = '2016-04-30'::date))))
                Rows Removed by Filter: 7343798
          ->  Seq Scan on groupe gp  (cost=0.00..133.45 rows=8045 width=12) (actual time=0.002..0.736 rows=8019 loops=118990)
    ->  Seq Scan on application app  (cost=0.00..11.67 rows=667 width=4) (actual time=0.003..0.073 rows=692 loops=118990)
          Filter: (id_application IS NOT NULL)
  Total runtime: 186716.635 ms

So I decided to search where the problem came from by reducing the number of conditions from my query until the performances is acceptable again.

所以我决定通过减少查询中的条件数量来搜索问题的来源，直到性能再次可以接受。

So with only this parameter

所以只有这个参数

WHERE date_stat >= '2016/04/01'

It takes only 1.9secondes (like the first working query) and it's also working with 2 parameters :

它只需要 1.9 秒（就像第一个工作查询一样）并且它也使用 2 个参数：

WHERE date_stat >= '2016/04/01' 
AND app.id_application IS NOT NULL

BUT when I try to add one of those line I have the Nested loop in the Explain

但是当我尝试添加其中一行时，我在解释中有嵌套循环

AND date_stat <= '2016/04/30' 
AND ( date_stat = date_gen-1 or ( date_gen = '2016/05/01' AND date_stat = '2016/04/30'))

Does someone have any idea where it could come from?

有人知道它可能来自哪里吗？

Answer 1

回答by Gabriel's Messanger

Ok, it looks like there's problem with optimizer estimations. He thiks that for april there will be only 1 rowso he choose NESTED LOOPwhich is very inefficient for big number of rows (118,990in that case).

好的，优化器估计似乎存在问题。他认为 4 月只有1 row这样，他才会选择NESTED LOOP对于大量行（118,990在这种情况下）效率非常低的方法。

Perform VACUUM ANALYZEfor every table. This will clean up dead tuples and refresh statistics.
consider adding index based on dateslike CREATE INDEX date_stat_idx ON <table with date_stat> USING btree (date_stat);

VACUUM ANALYZE为每个表执行。这将清理死元组并刷新统计信息。
考虑添加基于dates喜欢的索引CREATE INDEX date_stat_idx ON <table with date_stat> USING btree (date_stat);

Rerun the query,

重新运行查询，

使用嵌套循环提高 SQL 查询的性能 - PostgreSQL

提问by Dabeliuteef

回答by Gabriel's Messanger

相关推荐

最近更新

标签

使用嵌套循环提高 SQL 查询的性能 - PostgreSQL

提问by Dabeliuteef

回答by Gabriel's Messanger

相关推荐

postgresql 如何在plpgsql中使用变量作为表名

postgresql 中的编码和解码

postgresql Postgres：从时间戳更新日期并保留时间

postgresql 将主键设置为现有表

相关推荐

最近更新

标签