postgresql Postgres 性能问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11940515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:11:14  来源:igfitidea点击:

Postgres performance issues

performancepostgresql

提问by zermy

We are running Postgres 9.1.3 and we have recently started to run into major performance problems on one of our servers.

我们正在运行 Postgres 9.1.3,我们最近开始在我们的一台服务器上遇到严重的性能问题。

Our queries ran fine for a while, but as of August 1st, they have slowed down dramatically. It would appear that most of the problematic queries are Select queries (queries with count(*) are especially bad), but in general, the database is just running really slow.

我们的查询运行良好有一段时间,但截至 8 月 1 日,它们已显着放缓。看起来大多数有问题的查询都是 Select 查询(带有 count(*) 的查询特别糟糕),但总的来说,数据库运行非常缓慢。

We ran thisquery on the server and these were the changes that we have made to the default config file (Note: The server ran fine with these changes before, so, they likely don't matter much) :

我们在服务器上运行了这个查询,这些是我们对默认配置文件所做的更改(注意:服务器在这些更改之前运行良好,因此,它们可能无关紧要):

       name            |                                                current_setting
---------------------------+---------------------------------------------------------------------------------------------------------------
version                   | PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by  gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit
autovacuum                | off
bgwriter_delay            | 20ms
checkpoint_segments       | 6
checkpoint_warning        | 0
client_encoding           | UTF8
default_statistics_target | 1000
effective_cache_size      | 4778MB
effective_io_concurrency  | 2
fsync                     | off
full_page_writes          | off
lc_collate                | en_US.UTF-8
lc_ctype                  | en_US.UTF-8
listen_addresses          | *
maintenance_work_mem      | 1GB
max_connections           | 100
max_stack_depth           | 2MB
port                      | 5432
random_page_cost          | 2
server_encoding           | UTF8
shared_buffers            | 1792MB
synchronous_commit        | off
temp_buffers              | 16MB
TimeZone                  | US/Eastern
wal_buffers               | 16MB
wal_level                 | minimal
wal_writer_delay          | 10ms
work_mem                  | 16MB
(28 rows)

Time: 210.231 ms

Normally, when problems like this arise, the first thing people recommend is vacuuming and we have tried that. We vacuum analyzed most of the database, but it didn't help.

通常,当出现这样的问题时,人们推荐的第一件事就是吸尘,我们已经尝试过了。我们对大部分数据库进行了真空分析,但没有帮助。

We used Explainon some of our queries and noticed that Postgres was resorting to sequential scans even though the tables had indexes.

我们Explain在一些查询中使用过,并注意到即使表有索引,Postgres 也在诉诸顺序扫描。

We turned sequential scan off to force the query planner into using indexes, but that did not help either.

我们关闭了顺序扫描以强制查询规划器使用索引,但这也无济于事。

We then tried out thisquery to see if we had a lot of unused diskspace that Postgres was going through in order to find what it is looking for. Unfortunately, while some of our tables did have a bit of bulk, it did not seem significant enough to slow down overall system performance.

然后我们尝试了这个查询,看看我们是否有很多未使用的磁盘空间,Postgres 正在通过这些磁盘空间来找到它正在寻找的东西。不幸的是,虽然我们的一些表确实有一些体积,但它似乎不足以降低整体系统性能。

We think the slowdown might be I/O related, but we can't figure out the specifics. Is Postgres just being silly and if so, what part of it? Is there something wrong with the VM, or perhaps something wrong with the physical hardware itself?

我们认为放缓可能与 I/O 相关,但我们无法弄清楚具体情况。Postgres 只是愚蠢吗?如果是,那是它的哪一部分?VM 是否有问题,或者物理硬件本身是否有问题?

Do you guys have any other suggestions for things that we can try or check out?

你们对我们可以尝试或检查的事情有任何其他建议吗?

EDIT:

编辑:

I am so sorry for not updating this sooner. I got caught up in other things.

我很抱歉没有早点更新这个。我被其他事情缠住了。

On this particular machine, our performance greatly improved by making one small modification to the Virtual Machine's settings.

在这台特定的机器上,通过对虚拟机的设置进行一个小的修改,我们的性能大大提高。

There is a setting that deals with IO caching. It was originally set to to ON. We figured that constantly caching things was slowing things down and we were right. We turned it OFF, and things improved drastically.

有一个处理 IO 缓存的设置。它最初设置为 ON。我们认为不断缓存东西会减慢速度,我们是对的。我们把它关掉了,事情有了很大的改善。

Interestingly enough most of our other servers already had this setting turned off.

有趣的是,我们的大多数其他服务器已经关闭了此设置。

There are other issues, and I am sure we will take a lot of your suggestions, so, thanks a lot for helping.

还有其他问题,我相信我们会采纳您的很多建议,因此,非常感谢您的帮助。

采纳答案by Richard Huxton

It's difficult to be sure, but I think you are right to be suspicious of I/O issues. What can happen is that as tables get larger or connections are increased then cache hits start to fall. That increases I/o demands and slows everything down. Meanwhile, more queries arrive, making the problem worse. The situation is complicated for you because virtual disks don't necessarily behave the same as physical ones.

很难确定,但我认为您对 I/O 问题的怀疑是正确的。可能发生的情况是,随着表变大或连接增加,缓存命中率开始下降。这会增加 I/O 需求并减慢一切。同时,更多的查询到达,使问题变得更糟。情况对您来说很复杂,因为虚拟磁盘的行为不一定与物理磁盘相同。

Firstly you will need to measure actual activity on the VM (through vmstat or iostat perhaps). Secondly, do the same on the real hardware. Finally, run some standard disk bandwidth tools on both (in particular random read/write mixes). Now you'll be able to say how much of your available I/o is being used.

首先,您需要测量 VM 上的实际活动(可能通过 vmstat 或 iostat)。其次,在真实的硬件上做同样的事情。最后,在两者上运行一些标准的磁盘带宽工具(特别是随机读/写混合)。现在,您将能够说出有多少可用的 I/O 正在被使用。

As for query plans, without the schema details and explain analyse output no-one can say.

至于查询计划,没有架构细节和解释分析输出谁也说不准。

You will find the postgresql.org mailing list useful even if just for the archives. Also, the book linked below is excellent.

您会发现 postgresql.org 邮件列表很有用,即使只是用于存档。此外,下面链接的书非常好。

http://www.packtpub.com/postgresql-90-high-performance/book

http://www.packtpub.com/postgresql-90-high-performance/book

回答by kgrittn

Your biggest problem is this line:

你最大的问题是这一行:

autovacuum                | off

Turning it on won't immediately cure the problem, but it should keep things from eroding further. There are almost no cases where it is a good idea to turn this off. The main exception is a big bulk load followed by an explicit VACUUM FREEZE ANALYZE, after which autovacuum should be turned back on. With autovacuum off, you will see performance degrade, just as you have. Once the database has gotten into such bad shape, it requires more aggressive maintenance than autovacuum can provide to recover.

打开它不会立即解决问题,但它应该可以防止事情进一步恶化。几乎没有情况下关闭它是一个好主意。主要的例外是大批量加载,然后是明确的 VACUUM FREEZE ANALYZE,之后应重新打开 autovacuum。关闭 autovacuum 后,您会看到性能下降,就像您遇到的一样。一旦数据库变得如此糟糕,它需要比 autovacuum 提供的更积极的维护来恢复。

checkpoint_segments       | 6

Increasing this will help data modifications, but won't do much to improve the speed of SELECTstatements.

增加这将有助于数据修改,但不会对提高SELECT语句的速度有多大作用。

fsync                     | off
full_page_writes          | off

These settings tell PostgreSQL to speed up writes at the expense of persistence. If your hardware or OS (or VM) crashes or is abruptly killed, your database will be corrupted and your best bet will be to restore from your last known good backup. (Of course, since hardware can fail at any time, if you care about losing the data, you have a good backup strategy in place.)

这些设置告诉 PostgreSQL 以牺牲持久性为代价来加速写入。如果您的硬件或操作系统(或 VM)崩溃或突然终止,您的数据库将被损坏,您最好的选择是从上次已知的良好备份中恢复。(当然,由于硬件随时可能出现故障,如果您担心丢失数据,您有一个很好的备份策略。)

maintenance_work_mem      | 1GB

This is too high for an 8GB VM. You can always boost it on a single connection before running some heavy maintenance on that connection.

这对于 8GB 虚拟机来说太高了。在对该连接进行大量维护之前,您始终可以在单个连接上提升它。

wal_writer_delay          | 10ms

Even seasoned experts have trouble adjusting this to something that gets better performance than the default. It is almost always best left alone.

即使是经验丰富的专家也很难将其调整为比默认设置更好的性能。它几乎总是最好的。

Your best bet at this point is to use pg_dumpall to dump your database cluster to some other medium, start with a fresh initdb, and restore. As a database superuser, run VACUUM FREEZE ANALYZE(the FREEZEis not generally recommended except after a bulk load like that), and run with autovacuum turned on.

此时您最好的选择是使用 pg_dumpall 将您的数据库集群转储到其他介质,从一个新的 initdb 开始,然后恢复。作为数据库超级用户,运行VACUUM FREEZE ANALYZEFREEZE通常不建议这样做,除非在这样的批量加载之后),并在打开 autovacuum 的情况下运行。

I highly recommend that you get a copy of Greg Smith's "PostgreSQL 9.0 High Performance" book, and read it carefully. (Full disclosure, I was one of the technical reviewers for the book, but get no money from sales.) One of the first things he recommends is getting benchmark numbers on the speed of your RAM and disk before you even install PostgreSQL -- that way you know what you're dealing with.

我强烈建议您获取一份 Greg Smith 的“PostgreSQL 9.0 高性能”一书,并仔细阅读。(完全公开,我是这本书的技术审阅者之一,但没有从销售中获得任何收入。)他建议的第一件事就是在安装 PostgreSQL 之前获得有关 RAM 和磁盘速度的基准数据——即你知道你在处理什么的方式。

回答by Falmarri

(queries with count(*) are especially bad),

(使用 count(*) 的查询特别糟糕),

You should look into window functions

你应该研究窗口函数

Otherwise, we have no idea without seeing your relevant schema and your queries.

否则,如果没有看到您的相关架构和查询,我们将一无所知。

回答by Phil Arena

I would turn on auto vacuum as well. There are a few variables you can set that control how much the vacuum will interfere. With the amount of RAM you have you should have your shared buffers set between 2048MB - 3276MB. If you have a lot of extra RAM that your system doesn't seem to be using that you don't need elsewhere you should probably set it closer to the higher end. Also you may want to look at your max segment size with sysctl. Your maintenance_work_mem is really high, but if you are doing mostly maintenance then I suppose it isn't as bad as I first thought.

我也会打开自动吸尘器。您可以设置一些变量来控制真空的干扰程度。根据您拥有的 RAM 量,您应该将共享缓冲区设置在 2048MB - 3276MB 之间。如果您的系统似乎没有使用很多额外的 RAM,而您在其他地方也不需要,那么您可能应该将其设置为更接近高端。此外,您可能想使用 sysctl 查看最大段大小。您的 maintenance_work_mem 确实很高,但如果您主要进行维护,那么我想它并不像我最初想象的那么糟糕。