postgresql Linux上PostgreSQL中的配置参数work_mem

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8106181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:24:35  来源:igfitidea点击:

Configuration parameter work_mem in PostgreSQL on Linux

postgresqlpostgresql-performanceserver-configuration

提问by Grzes

I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_memparameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_memsettings and was very disappointed.

我必须通过调整基本的 PostgreSQL 服务器配置参数来优化查询。在文档中,我遇到了该work_mem参数。然后我检查了更改此参数将如何影响我的查询性能(使用排序)。我用各种work_mem设置测量了查询执行时间,非常失望。

The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB).

我执行查询的表包含 10,000,000 行,有 430 MB 的数据要排序。( Sort Method: external merge Disk: 430112kB).

With work_mem = 1MB, EXPLAINoutput is:

随着work_mem = 1MBEXPLAIN输出为:

Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort  (cost=4032588.78..4082588.66 rows=19999954 width=8) 
(actual time=22577.149..26424.951 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB

With work_mem = 5MB:

work_mem = 5MB

Total runtime: 36282.729 ms (sort: 25400 ms).
Sort  (cost=3485713.78..3535713.66 rows=19999954 width=8) 
      (actual time=25062.383..33246.561 rows=20000000 loops=1)
      Sort Key: "*SELECT* 1".n
      Sort Method:  external merge  Disk: 430104kB

With work_mem = 64MB:

work_mem = 64MB

Total runtime: 42566.538 ms (sort: 31000 ms).
Sort  (cost=3212276.28..3262276.16 rows=19999954 width=8) 
(actual time=28599.611..39454.279 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB

Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?

谁能解释为什么性能变得更糟?或者建议任何其他方法通过更改服务器参数来加快查询执行速度?

My query (I know it's not optimal, but I have to benchmark this kind of query):

我的查询(我知道这不是最优的,但我必须对这种查询进行基准测试):

SELECT n
FROM   (
    SELECT n + 1 AS n FROM table_name
    EXCEPT
    SELECT n FROM table_name) AS q1
ORDER BY n DESC;

Full execution plan:

完整的执行计划:

Sort  (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method:  quicksort  Memory: 25kB
->  Subquery Scan q1  (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
    ->  SetOp Except  (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
           ->  Sort  (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
                 Sort Key: "*SELECT* 1".n
                 Sort Method:  external merge  Disk: 430104kB
                 ->  Append  (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
                       ->  Subquery Scan "*SELECT* 1"  (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
                             ->  Seq Scan on table_name  (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
                       ->  Subquery Scan "*SELECT* 2"  (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
                             ->  Seq Scan on table_name  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms

采纳答案by Erwin Brandstetter

I posted your query plan on explain.depesz.com, have a look.

我在explain.depesz.com上发布了你的查询计划,看看

The query planner's estimates are terribly wrong in some places. Have you run ANALYZErecently?

查询规划器的估计在某些地方是非常错误的。你ANALYZE最近跑步了吗?

Read the chapters in the manual on Statistics Used by the Plannerand Planner Cost Constants. Pay special attention to the chapters on random_page_costand default_statistics_target.
You might try:

阅读手册中有关Planner 使用的统计信息Planner Cost Constants的章节。要特别注意的章节random_page_costdefault_statistics_target
你可以试试:

ALTER TABLE diplomas ALTER COLUMN number SET STATISTICS 1000;
ANALYZE diplomas;

Or go even a higher for a table with 10M rows. It depends on data distribution and actual queries. Experiment. Default is 100, maximum is 10000.

或者对于具有 10M 行的表甚至更高。这取决于数据分布和实际查询。实验。默认为 100,最大值为 10000。

For a database of that size, only 1 or 5 MB of work_memare generally not enough. Read the Postgres Wiki page on Tuning Postgresthat @aleroot linked to.

对于这种大小的数据库,work_mem通常只有 1 或 5 MB是不够的。阅读有关@aleroot 链接到的调整 PostgresPostgres Wiki 页面

As your query needs 430104kB of memory on diskaccording to EXPLAINoutput, you have to set work_memto something like 500MBor more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.

由于您的查询根据输出需要430104kB 的磁盘内存EXPLAIN,因此您必须设置work_mem500MB或更多以允许内存中排序。数据的内存表示比磁盘表示需要更多的空间。您可能对汤姆·莱恩 (Tom Lane) 最近就此事发表的文章感兴趣。

Increasing work_memby just a little, like you tried, won't help much or can even slow down. Setting it to high globally can even hurt, especially with concurrent access. Multiple sessions might starve one another for resources. Allocating more for one purpose takes away memory from another if the resource is limited. The best setup depends on the complete situation.

work_mem像您尝试的那样增加一点点,不会有太大帮助,甚至会减慢速度。将其全局设置为高甚至会造成伤害,尤其是并发访问时。多个会话可能会相互耗尽资源。如果资源有限,为一个目的分配更多的内存会占用另一个目的的内存。最佳设置取决于完整情况。

To avoid side effects, only set it high enough locally in your session, and temporarily for the query:

为避免副作用,请仅在会话中将其设置为足够高,并临时用于查询:

SET work_mem = '500MB';

Reset it to your default afterwards:

之后将其重置为默认值:

RESET work_mem;

Or use SET LOCALto set it just for the current transaction to begin with.

或者用于SET LOCAL设置它只是为当前事务开始。

回答by wildplasser

SET search_path='tmp';
-- Generate some data ...
-- DROP table tmp.table_name ;
-- CREATE table tmp.table_name ( n INTEGER NOT NULL PRIMARY KEY);
-- INSERT INTO tmp.table_name(n) SELECT generate_series(1,1000);
-- DELETE FROM tmp.table_name WHERE random() < 0.05 ;

The exceptquery is equivalent to the following NOT EXISTSform, which generates a different query plan (but the same results) here ( 9.0.1beta something)

查询等效于以下NOT EXISTS形式,从而产生不同的查询计划(但相同的结果)这里(9.0.1beta的东西)

-- EXPLAIN ANALYZE
WITH q1 AS (
    SELECT 1+tn.n  AS n
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.n = tn.n+1
        )   
    )
SELECT q1.n
FROM q1
ORDER BY q1.n DESC;

(a version with a recursive CTE might also be possible :-)

(具有递归 CTE 的版本也可能是可能的 :-)

EDIT: the query plans. all for 100K records with 0.2 % deleted

编辑:查询计划。全部用于 100K 记录,删除 0.2%

Original query:

原始查询:

    ------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=36461.76..36711.20 rows=99778 width=4) (actual time=2682.600..2682.917 rows=222 loops=1)
   Sort Key: q1.n
   Sort Method:  quicksort  Memory: 22kB
   ->  Subquery Scan q1  (cost=24984.41..26979.97 rows=99778 width=4) (actual time=2003.047..2682.036 rows=222 loops=1)
         ->  SetOp Except  (cost=24984.41..25982.19 rows=99778 width=4) (actual time=2003.042..2681.389 rows=222 loops=1)
               ->  Sort  (cost=24984.41..25483.30 rows=199556 width=4) (actual time=2002.584..2368.963 rows=199556 loops=1)
                     Sort Key: "*SELECT* 1".n
                     Sort Method:  external merge  Disk: 3512kB
                     ->  Append  (cost=0.00..5026.57 rows=199556 width=4) (actual time=0.071..1452.838 rows=199556 loops=1)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..2638.01 rows=99778 width=4) (actual time=0.067..470.652 rows=99778 loops=1)
                                 ->  Seq Scan on table_name  (cost=0.00..1640.22 rows=99778 width=4) (actual time=0.063..178.365 rows=99778 loops=1)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..2388.56 rows=99778 width=4) (actual time=0.014..429.224 rows=99778 loops=1)
                                 ->  Seq Scan on table_name  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.011..143.320 rows=99778 loops=1)
 Total runtime: 2684.840 ms
(14 rows)

NOT EXISTS-version with CTE:

不存在 - 带有 CTE 的版本:

----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6394.60..6394.60 rows=1 width=4) (actual time=699.190..699.498 rows=222 loops=1)
   Sort Key: q1.n
   Sort Method:  quicksort  Memory: 22kB
   CTE q1
     ->  Hash Anti Join  (cost=2980.01..6394.57 rows=1 width=4) (actual time=312.262..697.985 rows=222 loops=1)
           Hash Cond: ((tn.n + 1) = nx.n)
           ->  Seq Scan on table_name tn  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.013..143.210 rows=99778 loops=1)
           ->  Hash  (cost=1390.78..1390.78 rows=99778 width=4) (actual time=309.923..309.923 rows=99778 loops=1)
                 ->  Seq Scan on table_name nx  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..144.102 rows=99778 loops=1)
   ->  CTE Scan on q1  (cost=0.00..0.02 rows=1 width=4) (actual time=312.270..698.742 rows=222 loops=1)
 Total runtime: 700.040 ms
(11 rows)

NOT EXISTS-version without CTE

NOT EXISTS-没有CTE的版本

--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6394.58..6394.58 rows=1 width=4) (actual time=692.313..692.625 rows=222 loops=1)
   Sort Key: ((1 + tn.n))
   Sort Method:  quicksort  Memory: 22kB
   ->  Hash Anti Join  (cost=2980.01..6394.57 rows=1 width=4) (actual time=308.046..691.849 rows=222 loops=1)
         Hash Cond: ((tn.n + 1) = nx.n)
         ->  Seq Scan on table_name tn  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.014..142.781 rows=99778 loops=1)
         ->  Hash  (cost=1390.78..1390.78 rows=99778 width=4) (actual time=305.732..305.732 rows=99778 loops=1)
               ->  Seq Scan on table_name nx  (cost=0.00..1390.78 rows=99778 width=4) (actual time=0.007..143.783 rows=99778 loops=1)
 Total runtime: 693.139 ms
(9 rows)

My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.

我的结论是“不存在”版本导致 postgres 产生更好的计划。