PostgreSQL 查询耗时太长

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13407555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:33:22  来源:igfitidea点击:

PostgreSQL query taking too long

performancepostgresqlpostgresql-performance

提问by Davita

I have database with few hundred millions of rows. I'm running the following query:

我有几亿行的数据库。我正在运行以下查询:

select * from "Payments" as p
inner join "PaymentOrders" as po
on po."Id" = p."PaymentOrderId"
inner join "Users" as u
On u."Id" = po."UserId"
INNER JOIN "Roles" as r
on u."RoleId" = r."Id"
Where r."Name" = 'Moses'
LIMIT 1000

When the where clause finds a match in database, I get the result in several milliseconds, but if I modify the query and specify a non-existent r."Name"in where clause, it takes too much time to complete. I guess that PostgreSQL is doing a sequential scan on the Paymentstable (which contains the most rows), comparing each row one by one.

当 where 子句在数据库中找到匹配项时,我在几毫秒内得到结果,但是如果我修改查询并r."Name"在 where 子句中指定一个不存在的内容,则需要太多时间才能完成。我猜 PostgreSQL 正在对Payments表(包含最多行)进行顺序扫描,逐行比较每一行。

Isn't postgresql smart enough to check first if Rolestable contains any row with Name'Moses'?

postgresql 不够聪明来首先检查Roles表是否包含任何行Name'Moses'

Roles table contains only 15 row, while Payments contains ~350 million.

Roles 表仅包含 15 行,而 Payments 包含约 3.5 亿行。

I'm running PostgreSQL 9.2.1.

我正在运行 PostgreSQL 9.2.1。

BTW, this same query on the same schema/data takes 0.024ms to complete on MS SQL Server.

顺便说一句,对相同架构/数据的相同查询需要 0.024 毫秒才能在 MS SQL Server 上完成。

I'll update the question and post EXPLAIN ANALYSE data in a few hours.

我会在几个小时内更新问题并发布 EXPLAIN ANALYZE 数据。



Here'e explain analyse results: http://explain.depesz.com/s/7e7

这里解释分析结果:http://explain.depesz.com/s/7e7



And here's server configuration:

这是服务器配置:

version PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
client_encoding UNICODE
effective_cache_size    4500MB
fsync   on
lc_collate  English_United States.1252
lc_ctype    English_United States.1252
listen_addresses    *
log_destination stderr
log_line_prefix %t 
logging_collector   on
max_connections 100
max_stack_depth 2MB
port    5432
search_path dbo, "$user", public
server_encoding UTF8
shared_buffers  1500MB
TimeZone    Asia/Tbilisi
wal_buffers 16MB
work_mem    10MB

I'm running postgresql on a i5 cpu (4 core, 3.3 GHz), 8 GB of RAM and Crucial m4 SSD 128GB

我在 i5 cpu(4 核,3.3 GHz)、8 GB RAM 和 Crucial m4 SSD 128GB 上运行 postgresql



UPDATEThis looks like a bug in query planner. With the recomendation of Erwin Brandstetter I reported it to Postgresql bugs mailing list.

更新这看起来像是查询规划器中的一个错误。在 Erwin Brandstetter 的推荐下,我将其报告给了Postgresql 错误邮件列表

采纳答案by Erwin Brandstetter

Finally successful attempt

终于尝试成功

My other idea - as per comment:
What happens if you remove the LIMITclause for the case where no role is found? I have a suspicion that it will result in the fast plan - making LIMITthe culprit here.

我的另一个想法 - 根据评论:
如果您删除LIMIT找不到角色的情况下的子句会发生什么?我怀疑它会导致快速计划 -LIMIT在这里制造罪魁祸首。

You may be able to solve your problem by pushing down your query into a subqueryand applying the LIMITonly to the outer query (untested):

您可以通过将查询下推到子查询并将LIMIT唯一应用于外部查询(未经测试)来解决您的问题:

SELECT *
FROM  (
   SELECT *
   FROM   "Roles"         AS r  
   JOIN   "Users"         AS u  ON u."RoleId" = r."Id"
   JOIN   "PaymentOrders" AS po ON po."UserId" = u."Id"
   JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"
   WHERE  r."Name" = 'Moses'
  ) x
LIMIT  1000;
SELECT *
FROM  (
   SELECT *
   FROM   "Roles"         AS r  
   JOIN   "Users"         AS u  ON u."RoleId" = r."Id"
   JOIN   "PaymentOrders" AS po ON po."UserId" = u."Id"
   JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"
   WHERE  r."Name" = 'Moses'
  ) x
LIMIT  1000;

As per comment: @Davita tested and ruled out this workaround. @Kevin's answerlater clarified why the workaround failed: use a CTEinstead of the subquery.
Or check for existence of a role, before you employ the big query to eliminate the bad case.

根据评论:@Davita 测试并排除了此解决方法。@Kevin 的回答后来澄清了解决方法失败的原因:使用CTE而不是子查询。
或者在使用大查询来消除坏情况之前检查角色是否存在。

This leaves questions for PostgreSQL concerning the optimization of queries with LIMIT.

这给 PostgreSQL 留下了关于优化查询的问题LIMIT

There have been a number of recent bug reports concerning query plans with LIMIT. I quote Simon Riggs commenting on one of these reports here:

最近LIMIT有许多关于使用. 我在这里引用 Simon Riggs 对其中一份报告的评论:

Very bad plans with LIMIT are frequent. This is bad for us because adding LIMIT usually/is supposed to make queries faster, not slower.

We need to do something.

非常糟糕的 LIMIT 计划经常发生。这对我们不利,因为添加 LIMIT 通常/应该使查询更快,而不是更慢。

我们需要做点什么。

First attempt with no success

第一次尝试没有成功

I missed that @Craig already mentioned join_collapse_limitin the comments. So that was of limited use:

我错过join_collapse_limit了评论中已经提到的@Craig 。所以这是有限的用途:

Does reordering the JOINclauses have any effect?

重新排序JOIN子句有什么影响吗?

SELECT *
FROM   "Roles"         AS r  
JOIN   "Users"         AS u  ON u."RoleId" = r."Id"
JOIN   "PaymentOrders" AS po ON po."UserId" = u."Id"
JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"
WHERE  r."Name" = 'Moses'
LIMIT  1000

Related: you did not by chance mess with the setting of join_collapse_limitor geqo_threshold? Very low setting might prevent the planner from reordering your JOINclauses, which might explain your problem.

相关:您不是偶然弄乱了join_collapse_limit或的设置geqo_threshold?非常低的设置可能会阻止规划器重新排序您的JOIN子句,这可能会解释您的问题。

If that does not solve the case, I would try to create an index on "Roles"(Name). Not that this makes any sense with only 15 rows, but I would try to eliminate the suspicion that invalid statistics or cost parameters (or even a bug) make the planner believe the sequential scan on "Roles" to be more expensive than it is.

如果这不能解决问题,我会尝试在"Roles"(Name). 并不是说这对只有 15 行有任何意义,但我会尝试消除无效统计或成本参数(甚至是错误)使计划者相信“角色”上的顺序扫描比它更昂贵的怀疑。

回答by kgrittn

As suggested a couple times on the thread on the PostgreSQL community performance list, you can work around this issue by forcing an optimization barrier using a CTE, like this:

正如在 PostgreSQL 社区性能列表的线程上多次建议的那样,您可以通过使用 CTE 强制优化障碍来解决此问题,如下所示:

WITH x AS
(
SELECT *
  FROM "Payments" AS p
  JOIN "PaymentOrders" AS po ON po."Id" = p."PaymentOrderId"
  JOIN "Users" as u ON u."Id" = po."UserId"
  JOIN "Roles" as r ON u."RoleId" = r."Id"
  WHERE r."Name" = 'Moses'
)
SELECT * FROM x
  LIMIT 1000;

You may also get a good plan for your original query if you set a higher statistics target for "Roles"."Name" and then ANALYZE. For example:

如果您为 "Roles"."Name" 设置更高的统计目标,然后 ANALYZE,您也可能会为您的原始查询获得一个好的计划。例如:

ALTER TABLE "Roles"
  ALTER COLUMN "Name" SET STATISTICS 1000;
ANALYZE "Roles";

If it expects fewer matching rows to exist in the table, as it is likely to do with more fine-grained statistics, it will assume that it needs to read a higher percentage of the table to find them on a sequential scan. This may cause it to prefer using the index instead of sequentially scanning the table.

如果它期望表中存在更少的匹配行,因为它可能使用更细粒度的统计信息,它会假设它需要读取更高百分比的表以在顺序扫描中找到它们。这可能会导致它更喜欢使用索引而不是顺序扫描表。

You might also get a better plan for the original query by adjusting some of the planner's costing constants and caching assumptions. Things you could try in a single session, with the SETcommand:

您还可以通过调整计划器的一些成本计算常量和缓存假设来为原始查询获得更好的计划。您可以在单个会话中尝试使用以下SET命令:

  • Reduce random_page_cost. This is largely based on how heavily cached your data is. Given a table with hundreds of millions of rows you probably don't want to go below 2; although if the active data set in your database is heavily cached you can reduce it all the way down to the setting for seq_page_cost, and you may want to reduce both of them by an order of magnitude.

  • Make sure that effective_cache_size is set to the sum of shared_buffersand whatever your OS is caching. This doesn't allocate any memory; it just tells the optimizer how likely index pages are to remain in cache during heavy access. A higher setting makes indexes look better when compared to sequential scans.

  • Increase cpu_tuple_costto somewhere in the range of 0.03 to 0.05. I have found the default of 0.01 to be too low. I often get better plans by increasing it, and have never seen a value in the range I suggested cause worse plans to be chosen.

  • Make sure that your work_memsetting is reasonable. In most environments that I've run PostgreSQL, that is in the 16MB to 64MB range. This will allow better use of hash tables, bitmap index scans, sorts, etc., and can completely change your plans; almost always for the better. Beware setting this to a level that yields good plans if you have a large number of connections -- you should allow for the fact that each connection can allocate this much memory per node of the query it is running. The "rule of thumb" is to figure you will hit peaks around this setting times max_connections. This is one of the reasons that it is wise to limit your actual number of database connections using a connection pool.

  • 减少random_page_cost。这在很大程度上取决于您的数据缓存的程度。给定一个包含数亿行的表,您可能不想低于 2;尽管如果您的数据库中的活动数据集被大量缓存,您可以将其一直减少到 的设置seq_page_cost,并且您可能希望将它们都减少一个数量级。

  • 确保将 Effective_cache_size 设置为shared_buffers您的操作系统缓存的总和。这不会分配任何内存;它只是告诉优化器在大量访问期间索引页保留在缓存中的可能性。与顺序扫描相比,更高的设置使索引看起来更好。

  • 增加cpu_tuple_cost对地方在0.03至0.05。我发现默认值 0.01 太低了。我经常通过增加它来获得更好的计划,并且从未见过我建议的范围内的值会导致选择更糟糕的计划。

  • 确保您的work_mem设置合理。在我运行 PostgreSQL 的大多数环境中,它在 16MB 到 64MB 范围内。这将允许更好地使用哈希表、位图索引扫描、排序等,并且可以完全改变你的计划;几乎总是好的。如果您有大量连接,请注意将其设置为产生良好计划的级别——您应该考虑到每个连接可以为它正在运行的查询的每个节点分配这么多内存的事实。“经验法则”是计算您将在此设置时间附近达到峰值max_connections。这是使用连接池限制实际数据库连接数是明智之举的原因之一。

If you find a good combination of settings for these, you might want to make those changes to your postgresql.conffile. If you do that, monitor closely for performance regressions, and be prepared to tweak the settings for the best performance of your overall load.

如果您发现这些设置的良好组合,您可能希望对您的postgresql.conf文件进行这些更改。如果您这样做,请密切监视性能回归,并准备好调整设置以获得整体负载的最佳性能。

I agree that we need to do something to nudge the optimizer away from "risky" plans, even if they look like they will run faster on average; but I will be a little surprised if tuning your configuration so that the optimizer better models the actual costs of each alternative doesn't cause it to use an efficient plan.

我同意我们需要做一些事情来推动优化器远离“风险”计划,即使它们看起来平均运行得更快;但如果调整您的配置以便优化器更好地模拟每个替代方案的实际成本,我会有点惊讶,这不会导致它使用有效的计划。