PostgreSQL 临时表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/486154/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:11:58  来源:igfitidea点击:

PostgreSQL temporary tables

performancepostgresqloptimizationtemp-tables

提问by Nicholas Leonard

I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column)and then use this AVGto filter the table from all values below average. I then need to INSERTthese filtered results into a table.

我需要执行 250 万次查询。此查询生成一些我需要的行,AVG(column)然后使用它AVG从低于平均值的所有值中过滤表。然后我需要将INSERT这些过滤结果放入表格中。

The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLEfor each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.

以合理的效率做这样的事情的唯一方法似乎是TEMPORARY TABLE为每个 query-postmaster python-thread创建一个。我只是希望这些TEMPORARY TABLEs 不会被持久化到硬盘驱动器(根本)并且会保留在内存(RAM)中,当然,除非它们没有工作内存。

I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)

我想知道临时表是否会导致磁盘写入(这会干扰插入,即整个过程变慢)

回答by vladr

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.

请注意,在 Postgres 中,临时表的默认行为是它们不会自动删除,并且数据在提交时被持久化。见ON COMMIT

Temporary table are, however, dropped at the end of a database session:

然而,临时表在数据库会话结束时删除

Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction.

临时表在会话结束时自动删除,或者在当前事务结束时自动删除。

There are multiple considerations you have to take into account:

您必须考虑多种因素:

  • If you do want to explicitly DROPa temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROPsyntax.
  • In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROPcreation syntax), oron an as-needed basis (by preceding any CREATE TEMPORARY TABLEstatement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
  • While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffersoption in postgresql.conf
  • Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).
  • 如果您确实想DROP在事务结束时显式创建临时表,请使用CREATE TEMPORARY TABLE ... ON COMMIT DROP语法创建它。
  • 在存在连接池的情况下,一个数据库会话可能跨越多个客户端会话;为避免在 中发生冲突CREATE,您应该删除临时表——要么在将连接返回到池之前(例如通过在事务中执行所有操作并使用ON COMMIT DROP创建语法),要么根据需要(通过在任何CREATE TEMPORARY TABLE语句之前使用一个相应的DROP TABLE IF EXISTS,它的优点是也可以在外部事务中工作,例如,如果在自动提交模式下使用连接。)
  • 当临时表正在使用时,在溢出到磁盘之前,它有多少适合内存?请参阅中的temp_buffers选项postgresql.conf
  • 经常使用临时表时还有什么我应该担心的吗?删除临时表后建议使用真空,以清除目录中的任何死元组。使用默认设置 ( auto_vacuum)时,Postgres 将每 3 分钟左右自动为您清理一次。

Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table afteryou have populated it, then it is a good idea to create appropriate indices and issue an ANALYZEon the temp table in question afteryou're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.

此外,与您的问题无关(但可能与您的项目有关):请记住,如果您必须填充临时表对它运行查询,那么最好创建适当的索引并发出ANALYZEon完成插入有问题的临时表。默认情况下,基于成本的优化器会假设新创建的临时表有大约 1000 行,如果临时表实际上包含数百万行,这可能会导致性能不佳。

回答by Adam Hawes

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.

临时表只提供一种保证——它们在会话结束时被删除。对于小表,您可能将大部分数据保存在后备存储中。对于大表,我保证数据将定期刷新到磁盘,因为数据库引擎需要更多工作空间来处理其他请求。

EDIT: If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

编辑:如果您绝对需要仅使用 RAM 的临时表,您可以在 RAM 磁盘上为您的数据库创建一个表空间(/dev/shm 有效)。这会减少磁盘 IO 的数量,但请注意,目前无法在没有物理磁盘写入的情况下执行此操作;创建临时表时,数据库引擎会将表列表刷新到稳定存储。