SQL 加快 Postgres 中慢速 SELECT DISTINCT 查询的解决方案

Question

提问by orokusaki

The query is basically:

查询基本上是：

SELECT DISTINCT "my_table"."foo" from "my_table" WHERE...

Pretending that I'm 100% certain the DISTINCTportion of the query is the reason it runs slowly, I've omitted the rest of the query to avoid confusion, since it is the distinct portion's slowness that I'm primarily concerned with (distinct is always a source of slowness).

假装我 100% 确定DISTINCT查询的部分是它运行缓慢的原因，我省略了查询的其余部分以避免混淆，因为我主要关心的是不同部分的缓慢（不同的是总是缓慢的根源）。

The table in question has 2.5 million rows of data. The DISTINCTisneeded for purposes not listed here (because I don't want back a modified query, but rather just general information about making distinct queries run faster at the DBMSlevel, ifpossible).

有问题的表有 250 万行数据。该DISTINCT是需要在这里没有列出的目的（因为我不想回修改后的查询，而是让不同的查询，而只是一般的信息运行在较快的DBMS的水平，如果可能的话）。

How can I make DISTINCTrun quicker (using Postgres 9, specifically) without altering the SQL (ie, I can't alter this SQL coming in, but have access to optimize something at the DB level)?

如何在DISTINCT不更改 SQL 的情况下更快地运行（特别是使用 Postgres 9）（即，我无法更改此 SQL，但可以在数据库级别优化某些内容）？

Answer 1

采纳答案by antlersoft

Your DISTINCT is causing it to sort the output rows in order to find duplicates. If you put an index on the column(s) selected by the query, the database may be able to read them out in index order and save the sort step. A lot will depend on the details of the query and the tables involved-- your saying you "know the problem is with the DISTINCT" really limits the scope of available answers.

您的 DISTINCT 导致它对输出行进行排序以查找重复项。如果在查询选择的列上放置索引，数据库可能能够按索引顺序读出它们并保存排序步骤。很大程度上取决于查询的详细信息和所涉及的表——您说“知道问题出在 DISTINCT”确实限制了可用答案的范围。

Answer 2

回答by

Oftentimes, you can make such queries run faster by working around the distinctby using a group byinstead:

通常，您可以distinct通过使用 agroup by来解决这些问题，从而使此类查询运行得更快：

select my_table.foo 
from my_table 
where [whatever where conditions you want]
group by foo;

Answer 3

回答by maniek

You can try increasing the work_mem setting, depending on the size of Your dataset It can cause switching the query plan to hash aggregates, which are usually faster.

您可以尝试增加 work_mem 设置，具体取决于您的数据集的大小它可能导致将查询计划切换为哈希聚合，这通常更快。

But before setting it too high globally, first read up on it. You can easily blow up Your server, because the max_connectionssetting acts as a multiplier to this number.

但是在将其全局设置得太高之前，请先阅读它。您可以轻松炸毁您的服务器，因为该max_connections设置是该数字的乘数。

This means that if you were to set work_mem = 128MBand you set max_connections = 100(the default), you should have more than 12.8GB of RAM. You're essentially telling the server that it can use that much for performing queries (not even considering any other memory use by Postgres or otherwise).

这意味着，如果您要设置work_mem = 128MB并设置max_connections = 100（默认），您应该拥有超过 12.8GB 的 RAM。您实际上是在告诉服务器它可以使用那么多来执行查询（甚至不考虑 Postgres 或其他方式使用的任何其他内存）。

SQL 加快 Postgres 中慢速 SELECT DISTINCT 查询的解决方案

提问by orokusaki

采纳答案by antlersoft

回答by

回答by maniek

相关推荐

最近更新

标签

SQL 加快 Postgres 中慢速 SELECT DISTINCT 查询的解决方案

提问by orokusaki

采纳答案by antlersoft

回答by

回答by maniek

相关推荐

如何在 SQL Server 中搜索带撇号的名称？

SQL 全文搜索安装与否

SQL Oracle 的 IN 与 OR，哪个更快？

CREATE TABLE IF NOT EXISTS 在 SQL Server 中等效

相关推荐

最近更新

标签