PostgreSQL 的隐藏特性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/761327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hidden Features of PostgreSQL
提问by ramanujan
I'm surprised this hasn't been posted yet. Any interesting tricks that you know about in Postgres? Obscure config options and scaling/perf tricks are particularly welcome.
我很惊讶这还没有发布。你知道 Postgres 有什么有趣的技巧吗?晦涩的配置选项和缩放/性能技巧特别受欢迎。
I'm sure we can beat the 9 comments on the corresponding MySQL thread:)
我相信我们可以击败相应MySQL 线程上的 9 条评论:)
采纳答案by tommym
Since postgres is a lot more sane than MySQL, there are not that many "tricks" to report on ;-)
由于 postgres 比 MySQL 理智得多,所以没有那么多“技巧”可以报告;-)
The manualhas some nice performancetips.
A few other performance related things to keep in mind:
要记住的其他一些与性能相关的事情:
- Make sure autovacuum is turned on
- Make sure you've gone through your postgres.conf (effective cache size, shared buffers, work mem ... lots of options there to tune).
- Use pgpool or pgbouncerto keep your "real" database connections to a minimum
- Learn how EXPLAINand EXPLAIN ANALYZE works. Learn to read the output.
- CLUSTERsorts data on disk according to an index. Can dramatically improve performance of large (mostly) read-only tables. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered.
- 确保自动吸尘器已打开
- 确保你已经浏览了你的 postgres.conf(有效缓存大小、共享缓冲区、工作内存......有很多选项可以调整)。
- 使用 pgpool 或pgbouncer将您的“真实”数据库连接保持在最低限度
- 了解EXPLAIN和 EXPLAIN ANALYZE 的工作原理。学习阅读输出。
- CLUSTER根据索引对磁盘上的数据进行排序。可以显着提高大型(主要是)只读表的性能。聚类是一次性操作:随后更新表时,不会对更改进行聚类。
Here's a few things I've found useful that aren't config or performance related per se.
这里有一些我发现有用的东西,它们本身与配置或性能无关。
To see what's currently happening:
要查看当前发生的情况:
select * from pg_stat_activity;
Search misc functions:
搜索杂项功能:
select * from pg_proc WHERE proname ~* '^pg_.*'
Find size of database:
查找数据库大小:
select pg_database_size('postgres');
select pg_size_pretty(pg_database_size('postgres'));
Find size of all databases:
查找所有数据库的大小:
select datname, pg_size_pretty(pg_database_size(datname)) as size
from pg_database;
Find size of tables and indexes:
查找表和索引的大小:
select pg_size_pretty(pg_relation_size('public.customer'));
Or, to list all tables and indexes (probably easier to make a view of this):
或者,列出所有表和索引(可能更容易查看):
select schemaname, relname,
pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) as size
from (select schemaname, relname, 'table' as type
from pg_stat_user_tables
union all
select schemaname, relname, 'index' as type
from pg_stat_user_indexes) x;
Oh, and you can nest transactions, rollback partial transactions++
哦,你可以嵌套事务,回滚部分事务++
test=# begin;
BEGIN
test=# select count(*) from customer where name='test';
count
-------
0
(1 row)
test=# insert into customer (name) values ('test');
INSERT 0 1
test=# savepoint foo;
SAVEPOINT
test=# update customer set name='john';
UPDATE 3
test=# rollback to savepoint foo;
ROLLBACK
test=# commit;
COMMIT
test=# select count(*) from customer where name='test';
count
-------
1
(1 row)
回答by ChristopheD
The easiesttrick to let postgresql perform a lot better (apart from setting and using proper indexes of course) is just to give it more RAM to work with (if you have not done so already). On most default installations the value for shared_buffersis way too low (in my opinion). You can set
在最简单的窍门,让PostgreSQL的(除了设置和使用过程中的适当指标)只是给它更多的内存来工作,(如果你还没有这样做的话)执行好了很多。在大多数默认安装中,shared_buffers的值太低了(在我看来)。你可以设置
shared_buffers
共享缓冲区
in postgresql.conf. Divide this number by 128 to get an approximation of the amount of memory (in MB) postgres can claim. If you up it enough this will make postgresql fly. Don't forget to restart postgresql.
在 postgresql.conf 中。将此数字除以 128 可得到 postgres 可以声明的近似内存量(以 MB 为单位)。如果你足够了,这将使 postgresql 飞起来。不要忘记重新启动 postgresql。
On Linux systems, when postgresql won't start again you will probably have hit the kernel.shmmax limit. Set it higher with
在 Linux 系统上,当 postgresql 无法再次启动时,您可能已经达到了 kernel.shmmax 限制。将其设置得更高
sysctl -w kernel.shmmax=xxxx
To make this persist between boots, add a kernel.shmmax entry to /etc/sysctl.conf.
要在引导之间保持这种状态,请将 kernel.shmmax 条目添加到 /etc/sysctl.conf。
A whole bunch of Postgresql tricks can be found here:
可以在这里找到一大堆 Postgresql 技巧:
回答by Yann Ramin
Postgres has a very powerful datetime handling facility thanks to its INTERVAL support.
由于其 INTERVAL 支持,Postgres 具有非常强大的日期时间处理工具。
For example:
例如:
select NOW(), NOW() + '1 hour';
now | ?column?
-------------------------------+-------------------------------
2009-04-18 01:37:49.116614+00 | 2009-04-18 02:37:49.116614+00
(1 row)
select current_date ,(current_date + interval '1 year')::date;
date | date
---------------------+----------------
2014-10-17 | 2015-10-17
(1 row)
You can cast many strings to an INTERVAL type.
您可以将许多字符串转换为 INTERVAL 类型。
回答by ramanujan
COPY
复制
I'll start. Whenever I switch to Postgres from SQLite, I usually have some really big datasets. The key is to load your tables with COPY FROM rather than doing INSERTS. See documentation:
我会开始。每当我从 SQLite 切换到 Postgres 时,我通常都会有一些非常大的数据集。关键是用 COPY FROM 而不是 INSERTS 加载你的表。请参阅文档:
http://www.postgresql.org/docs/8.1/static/sql-copy.html
http://www.postgresql.org/docs/8.1/static/sql-copy.html
The following example copies a table to the client using the vertical bar (|) as the field delimiter:
以下示例使用竖线 (|) 作为字段分隔符将表复制到客户端:
COPY country TO STDOUT WITH DELIMITER '|';
To copy data from a file into the country table:
要将文件中的数据复制到国家/地区表中:
COPY country FROM '/usr1/proj/bray/sql/country_data';
See also here: Faster bulk inserts in sqlite3?
另请参见此处: sqlite3 中更快的批量插入?
回答by Quassnoi
- My by far favorite is
generate_series
: at last a clean way to generate dummy rowsets. Ability to use a correlated value in a
LIMIT
clause of a subquery:SELECT ( SELECT exp_word FROM mytable OFFSET id LIMIT 1 ) FROM othertable
- Abitlity to use multiple parameters in custom aggregates (not covered by the documentation): see the article in my blogfor an example.
- 我迄今为止最喜欢的是
generate_series
:最后一种生成虚拟行集的干净方法。 能够在
LIMIT
子查询的子句中使用相关值:SELECT ( SELECT exp_word FROM mytable OFFSET id LIMIT 1 ) FROM othertable
- 在自定义聚合中使用多个参数的能力(文档未涵盖):请参阅我博客中的文章以获取示例。
回答by Chad Birch
One of the things I really like about Postgres is some of the data types supported in columns. For example, there are column types made for storing Network Addressesand Arrays. The corresponding functions (Network Addresses/ Arrays) for these column types let you do a lot of complex operations inside queries that you'd have to do by processing results through code in MySQL or other database engines.
我真正喜欢 Postgres 的一件事是列中支持的一些数据类型。例如,有用于存储网络地址和数组的列类型。这些列类型的相应函数(网络地址/数组)使您可以在查询中执行许多复杂的操作,而这些操作必须通过 MySQL 或其他数据库引擎中的代码处理结果来完成。
回答by Nicholas Leonard
Arrays are really cool once you get to know 'em. Lets say you would like to store some hyper links between pages. You might start by thinking about creating a Table kinda like this:
一旦你了解它们,数组真的很酷。假设您想在页面之间存储一些超链接。您可能会首先考虑创建一个像这样的表:
CREATE TABLE hyper.links (
tail INT4,
head INT4
);
If you needed to index the tailcolumn, and you had, say 200,000,000 links-rows (like wikipedia would give you), you would find yourself with a huge Table and a huge Index.
如果您需要索引尾列,并且您有 200,000,000 个链接行(就像维基百科会给您的),您会发现自己有一个巨大的表和一个巨大的索引。
However, with PostgreSQL, you could use this Table format instead:
但是,对于 PostgreSQL,您可以改用此表格式:
CREATE TABLE hyper.links (
tail INT4,
head INT4[],
PRIMARY KEY(tail)
);
To get all heads for a link you could send a command like this (unnest() is standard since 8.4):
要获取链接的所有头部,您可以发送这样的命令(unnest() 是 8.4 以来的标准):
SELECT unnest(head) FROM hyper.links WHERE tail = ;
This query is surprisingly fast when it is compared with the first option (unnest() is fast and the Index is way way smaller). Furthermore, your Table and Index will take up much less RAM-memory and HD-space, especially when your Arrays are so long that they are compressed to a Toast Table. Arrays are really powerful.
与第一个选项(unnest() 速度快而索引小得多)相比,此查询的速度惊人地快。此外,您的表和索引将占用更少的 RAM 内存和 HD 空间,尤其是当您的数组太长以至于它们被压缩为 Toast 表时。数组真的很强大。
Note: while unnest() will generate rows out of an Array, array_agg() will aggregate rows into an Array.
注意:unnest() 将从数组中生成行,而 array_agg() 会将行聚合到数组中。
回答by Nakh
Inheritance..infact Multiple Inheritance (as in parent-child "inheritance" not 1-to-1 relation inheritance which many web frameworks implement when working with postgres).
PostGIS (spatial extension), a wonderful add-on that offers comprehensive set of geometry functions and coordinates storage out of the box. Widely used in many open-source geo libs (e.g. OpenLayers,MapServer,Mapnik etc) and definitely way better than MySQL's spatial extensions.
Writing procedures in different languages e.g. C, Python,Perl etc (makes your life easir to code if you're a developer and not a db-admin).
Also all procedures can be stored externally (as modules) and can be called or imported at runtime by specified arguments. That way you can source control the code and debug the code easily.
A huge and comprehensive catalogue on all objects implemented in your database (i.e. tables,constraints,indexes,etc).
I always find it immensely helpful to run few queries and get all meta info e.g. ,constraint names and fields on which they have been implemented on, index names etc.
For me it all becomes extremely handy when I have to load new data or do massive updates in big tables (I would automatically disable triggers and drop indexes) and then recreate them again easily after processing has finished. Someone did an excellent job of writing handful of these queries.
Multiple schemas under one database, you can use it if your database has large number of tables, you can think of schemas as categories. All tables (regardless of it's schema) have access to all other tables and functions present in parent db.
继承......事实上多重继承(如父子“继承”而不是许多 web 框架在使用 postgres 时实现的一对一关系继承)。
PostGIS(空间扩展),一个很棒的附加组件,提供了一套全面的几何函数和开箱即用的坐标存储。广泛用于许多开源地理库(例如 OpenLayers、MapServer、Mapnik 等),并且绝对比 MySQL 的空间扩展更好。
用不同的语言编写程序,例如 C、Python、Perl 等(如果您是开发人员而不是数据库管理员,则可以让您更轻松地编写代码)。
此外,所有过程都可以存储在外部(作为模块),并且可以在运行时通过指定的参数调用或导入。这样您就可以轻松地对代码进行源代码控制和调试。
数据库中实现的所有对象(即表、约束、索引等)的庞大而全面的目录。
我总是发现运行少量查询并获取所有元信息非常有帮助,例如,约束名称和已实施它们的字段,索引名称等。
对我来说,当我必须加载新数据或在大表中进行大量更新时(我会自动禁用触发器和删除索引),然后在处理完成后轻松地重新创建它们时,这一切都变得非常方便。有人在编写少数这些查询方面做得非常出色。
一个数据库下有多个schema,如果你的数据库有大量的表,你可以使用它,你可以把schema看作类别。所有表(无论其架构如何)都可以访问父数据库中存在的所有其他表和函数。
回答by Nakh
Materialized Views are pretty easy to setup:
物化视图很容易设置:
CREATE VIEW my_view AS SELECT id, AVG(my_col) FROM my_table GROUP BY id;
CREATE TABLE my_matview AS SELECT * FROM my_view;
That creates a new table, my_matview, with the columns and values of my_view. Triggers or a cron script can then be setup to keep the data up to date, or if you're lazy:
这将创建一个新表 my_matview,其中包含 my_view 的列和值。然后可以设置触发器或 cron 脚本以保持数据最新,或者如果您很懒惰:
TRUNCATE my_matview;
INSERT INTO my_matview SELECT * FROM my_view;
回答by AAS
You don't need to learn how to decipher "explain analyze" output, there is a tool: http://explain.depesz.com
你不需要学习如何破译“解释分析”输出,有一个工具:http: //explain.depesz.com