postgresql COUNT(DISTINCT ...) 很慢
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11250253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
postgresql COUNT(DISTINCT ...) very slow
提问by ferson2020
I have a very simple SQL query:
我有一个非常简单的 SQL 查询:
SELECT COUNT(DISTINCT x) FROM table;
My table has about 1.5 million rows. This query is running pretty slowly; it takes about 7.5s, compared to
我的表有大约 150 万行。这个查询运行得很慢;大约需要 7.5 秒,相比之下
SELECT COUNT(x) FROM table;
which takes about 435ms. Is there any way to change my query to improve performance? I've tried grouping and doing a regular count, as well as putting an index on x; both have the same 7.5s execution time.
大约需要 435 毫秒。有什么方法可以更改我的查询以提高性能吗?我尝试过分组并进行常规计数,并在 x 上放置索引;两者都有相同的 7.5 秒执行时间。
回答by Ankur
You can use this:
你可以使用这个:
SELECT COUNT(*) FROM (SELECT DISTINCT column_name FROM table_name) AS temp;
This is much faster than:
这比以下快得多:
COUNT(DISTINCT column_name)
回答by wildplasser
-- My default settings (this is basically a single-session machine, so work_mem is pretty high)
SET effective_cache_size='2048MB';
SET work_mem='16MB';
\echo original
EXPLAIN ANALYZE
SELECT
COUNT (distinct val) as aantal
FROM one
;
\echo group by+count(*)
EXPLAIN ANALYZE
SELECT
distinct val
-- , COUNT(*)
FROM one
GROUP BY val;
\echo with CTE
EXPLAIN ANALYZE
WITH agg AS (
SELECT distinct val
FROM one
GROUP BY val
)
SELECT COUNT (*) as aantal
FROM agg
;
Results:
结果:
original QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36448.06..36448.07 rows=1 width=4) (actual time=1766.472..1766.472 rows=1 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=31.371..185.914 rows=1499845 loops=1)
Total runtime: 1766.642 ms
(3 rows)
group by+count(*)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=36464.31..36477.31 rows=1300 width=4) (actual time=412.470..412.598 rows=1300 loops=1)
-> HashAggregate (cost=36448.06..36461.06 rows=1300 width=4) (actual time=412.066..412.203 rows=1300 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=26.134..166.846 rows=1499845 loops=1)
Total runtime: 412.686 ms
(4 rows)
with CTE
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36506.56..36506.57 rows=1 width=0) (actual time=408.239..408.239 rows=1 loops=1)
CTE agg
-> HashAggregate (cost=36464.31..36477.31 rows=1300 width=4) (actual time=407.704..407.847 rows=1300 loops=1)
-> HashAggregate (cost=36448.06..36461.06 rows=1300 width=4) (actual time=407.320..407.467 rows=1300 loops=1)
-> Seq Scan on one (cost=0.00..32698.45 rows=1499845 width=4) (actual time=24.321..165.256 rows=1499845 loops=1)
-> CTE Scan on agg (cost=0.00..26.00 rows=1300 width=0) (actual time=407.707..408.154 rows=1300 loops=1)
Total runtime: 408.300 ms
(7 rows)
The same plan as for the CTE could probably also be produced by other methods (window functions)
与 CTE 相同的计划可能也可以通过其他方法(窗口函数)生成
回答by Tometzky
If your count(distinct(x))
is significantly slower than count(x)
then you can speed up this query by maintaining x value counts in different table, for example table_name_x_counts (x integer not null, x_count int not null)
, using triggers. But your write performance will suffer and if you update multiple x
values in single transaction then you'd need to do this in some explicit order to avoid possible deadlock.
如果您count(distinct(x))
的速度明显慢于count(x)
那么您可以通过维护不同表中的 x 值计数来加速此查询,例如table_name_x_counts (x integer not null, x_count int not null)
,使用触发器。但是您的写入性能会受到影响,如果您x
在单个事务中更新多个值,那么您需要以某种明确的顺序执行此操作以避免可能的死锁。
回答by Rana Pratap Singh
I was also searching same answer, because at some point of time I needed total_count with distinct values along with limit/offset.
我也在寻找相同的答案,因为在某些时候我需要total_count 与不同的值以及 limit/offset。
Because it's little tricky to do- To get total count with distinct values along with limit/offset. Usually it's hard to get total count with limit/offset. Finally I got the way to do -
因为这有点棘手 - 获得具有不同值以及限制/偏移量的总数。通常很难获得限制/偏移的总数。终于我找到了办法——
SELECT DISTINCT COUNT(*) OVER() as total_count, * FROM table_name limit 2 offset 0;
SELECT DISTINCT COUNT(*) OVER() as total_count, * FROM table_name limit 2 offset 0;
Query performance is also high.
查询性能也很高。