SQL 加快 Postgres 中慢速 SELECT DISTINCT 查询的解决方案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6598778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Solution for speeding up a slow SELECT DISTINCT query in Postgres
提问by orokusaki
The query is basically:
查询基本上是:
SELECT DISTINCT "my_table"."foo" from "my_table" WHERE...
Pretending that I'm 100% certain the DISTINCT
portion of the query is the reason it runs slowly, I've omitted the rest of the query to avoid confusion, since it is the distinct portion's slowness that I'm primarily concerned with (distinct is always a source of slowness).
假装我 100% 确定DISTINCT
查询的部分是它运行缓慢的原因,我省略了查询的其余部分以避免混淆,因为我主要关心的是不同部分的缓慢(不同的是总是缓慢的根源)。
The table in question has 2.5 million rows of data. The DISTINCT
isneeded for purposes not listed here (because I don't want back a modified query, but rather just general information about making distinct queries run faster at the DBMSlevel, ifpossible).
有问题的表有 250 万行数据。该DISTINCT
是需要在这里没有列出的目的(因为我不想回修改后的查询,而是让不同的查询,而只是一般的信息运行在较快的DBMS的水平,如果可能的话)。
How can I make DISTINCT
run quicker (using Postgres 9, specifically) without altering the SQL (ie, I can't alter this SQL coming in, but have access to optimize something at the DB level)?
如何在DISTINCT
不更改 SQL 的情况下更快地运行(特别是使用 Postgres 9)(即,我无法更改此 SQL,但可以在数据库级别优化某些内容)?
采纳答案by antlersoft
Your DISTINCT is causing it to sort the output rows in order to find duplicates. If you put an index on the column(s) selected by the query, the database may be able to read them out in index order and save the sort step. A lot will depend on the details of the query and the tables involved-- your saying you "know the problem is with the DISTINCT" really limits the scope of available answers.
您的 DISTINCT 导致它对输出行进行排序以查找重复项。如果在查询选择的列上放置索引,数据库可能能够按索引顺序读出它们并保存排序步骤。很大程度上取决于查询的详细信息和所涉及的表——您说“知道问题出在 DISTINCT”确实限制了可用答案的范围。
回答by
Oftentimes, you can make such queries run faster by working around the distinct
by using a group by
instead:
通常,您可以distinct
通过使用 agroup by
来解决这些问题,从而使此类查询运行得更快:
select my_table.foo
from my_table
where [whatever where conditions you want]
group by foo;
回答by maniek
You can try increasing the work_mem setting, depending on the size of Your dataset It can cause switching the query plan to hash aggregates, which are usually faster.
您可以尝试增加 work_mem 设置,具体取决于您的数据集的大小它可能导致将查询计划切换为哈希聚合,这通常更快。
But before setting it too high globally, first read up on it. You can easily blow up Your server, because the max_connections
setting acts as a multiplier to this number.
但是在将其全局设置得太高之前,请先阅读它。您可以轻松炸毁您的服务器,因为该max_connections
设置是该数字的乘数。
This means that if you were to set work_mem = 128MB
and you set max_connections = 100
(the default), you should have more than 12.8GB of RAM. You're essentially telling the server that it can use that much for performing queries (not even considering any other memory use by Postgres or otherwise).
这意味着,如果您要设置work_mem = 128MB
并设置max_connections = 100
(默认),您应该拥有超过 12.8GB 的 RAM。您实际上是在告诉服务器它可以使用那么多来执行查询(甚至不考虑 Postgres 或其他方式使用的任何其他内存)。