具有复杂查询匹配模式的 MySQL 与 PostgreSQL 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5690103/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 22:58:15  来源:igfitidea点击:

MySQL vs PostgreSQL performance with a complex query matching patterns

mysqlsqlpostgresqlquery-optimizationdatabase-performance

提问by David Magalh?es

I have a complex database, with around 30 tables. One table has more than 500,000 rows and another more than 15,000 and I use both in a separate database until today I decided to implement in only one database.

我有一个复杂的数据库,大约有 30 个表。一个表有超过 500,000 行,另一个超过 15,000 行,我在一个单独的数据库中使用它们,直到今天我决定只在一个数据库中实现。

Before today, the table with 500,000 rows was in a MySQL database and the 15,000 row table was in PostgreSQL. In one page of heavy use, this was the result in a PHP benchmark:

在今天之前,500,000 行的表在 MySQL 数据库中,而 15,000 行的表在 PostgreSQL 中。在大量使用的一页中,这是 PHP 基准测试的结果:

getSimilarAvaiable - 0.0287 s
getUnavaiable - 0.27 s
ProcessDataOfUnavaiable - 1.4701 s
Process - 1.8622 s
TotalPageTime - 3.631 s

After I migrate everything to PostgreSQL, and use the same SQL codewithout any changes the result of the same page was this:

在我将所有内容迁移到 PostgreSQL 并使用相同的 SQL 代码而不进行任何更改后,同一页面的结果是这样的:

getSimilarAvaiable - 2.7465 s
getUnavaiableCars - 9.0763 s
ProcesseDataOfUnavaiableCars - 1.4167 s
ProcessCars - 1.7207 s
TotalPageTime - 14.9602 s

I put everything the same in MySQL, same index, everything, but I can't understand why there is this huge difference. What I should do to optimize this?

我把所有东西都放在 MySQL 中,同样的索引,所有的东西,但我不明白为什么会有这么大的差异。我应该怎么做来优化这个?

EDIT: Now better explained.

编辑:现在更好地解释。

The 500.00 table is composed with the follow structure:

500.00 表由以下结构组成:

id - bigint (primary key)
plate- varchar(10) Unique key
manufacturer - varchar(30)
vin - varchar(30)

The major query is something like this:

主要查询是这样的:

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 3 as n, substr(plate,4,2) as l 
FROM imtt_vin WHERE substr(plate,4,1) >= 'A' and substr(plate,4,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 2 as n, substr(plate,7,2) as l 
FROM imtt_vin WHERE substr(plate,7,1) >= 'A' and substr(plate,7,1) <= 'Z' AND 
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
ORDER BY n, l, plate;

EDIT2: Tested with a complex single query and I reduced it from 15 to 8/9 seconds. Even so it is too much for me.

EDIT2:使用复杂的单个查询进行测试,我将其从 15 秒减少到 8/9 秒。即便如此,这对我来说也太过分了。

回答by peufeu

You need to post EXPLAIN yourquery (for mysql) and EXPLAIN ANALYZE yourquery (for postgres) ; without that it's impossible to say anything relevant.

您需要发布 EXPLAIN yourquery (for mysql) 和 EXPLAIN ANALYZE yourquery (for postgres) ;没有它就不可能说任何相关的东西。

Also SELECT pg_relation_size('imtt_vin')

也选择 pg_relation_size('imtt_vin')

For instance what is the value of "?" in this query ?

例如“?”的值是多少?在这个查询中?

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?

I don't know about license plates where you work but this part :

我不知道你工作的车牌,但这部分:

WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z'

probably selects all rows in the database, thus its only purpose is to burn CPU cycles. You could at least rewrite it (and all the others) like this to avoid a call to substr() :

可能会选择数据库中的所有行,因此其唯一目的是消耗 CPU 周期。你至少可以像这样重写它(和所有其他的)以避免调用 substr() :

WHERE substr(plate,1,1) BETWEEN 'A' AND 'Z'

And of course remove the condition when it is not useful.

当然,当它没有用时删除条件。

Then we have :

然后我们有:

manufacturer ILIKE '%".self::$Manufacturer."%'

Bad database design : are there 500.000 car manufacturers in the world ? Probably not. You should put the manufacturers in another table and use a foreign key. That would turn this unindexable condition into an indexable one.

糟糕的数据库设计:世界上有 50 万家汽车制造商吗?可能不是。您应该将制造商放在另一个表中并使用外键。这会将这种不可索引的情况变成可索引的情况。

For the rest, post EXPLAIN / EXPLAIN ANALYZE.

其余的,发布解释/解释分析。

回答by mhitza

If you were using MyISAM in MySQL the performance difference could theoretically (because not much has been exposed regarding your database design and queries performed) be explained. Regarding cross performance between the two RDBMS I'd recommend you take a look at this comparison page(Anchored to the MyISAM section).

如果您在 MySQL 中使用 MyISAM,理论上可以解释性能差异(因为关于您的数据库设计和执行的查询没有太多暴露)。关于两个 RDBMS 之间的交叉性能,我建议您查看此比较页面(锚定到 MyISAM 部分)。

回答by user440297

MySQL uses more memory by default. I think it is assigned to use more than 256MB by def install. Not sure on the exact number. PostgreSQL by default is set to use something like 32MB. Try to bump each one up to 1GB of ram in config file then run benchmarks and get back to us.

MySQL 默认使用更多内存。我认为 def install 分配给它使用超过 256MB。不确定确切的数字。PostgreSQL 默认设置为使用类似 32MB 的大小。尝试在配置文件中将每个内存增加到 1GB,然后运行基准测试并返回给我们。

回答by Will Hartung

Seems to me that you likely have not updated the statistics on the Postgres database. With improper statistics, the database will not perform very well.

在我看来,您可能没有更新 Postgres 数据库的统计信息。如果统计不正确,数据库将不会表现得很好。

回答by Erwin Brandstetter

Query

询问

(
SELECT 1 AS n, left(plate, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?   -- You probably mean: vin = ?
ORDER  BY l, plate
)

UNION ALL
(
SELECT 3 AS n, substr(plate, 4, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?
ORDER  BY l, plate
)

UNION  ALL ...
  • Use UNION ALL. UNIONwould be used to fold duplicates, which is obviously not the case here, and would be more expensive.
  • Since your leading ORDER BY item is n, it's probably more efficient to order individual legs of the query. The extra set of parentheses is needed for that.
  • left (plate, 2)is a bit faster than substr(plate, 1, 2). Works only for leading substrings (your first SELECT).
  • 使用UNION ALL. UNION将用于折叠重复项,这显然不是这里的情况,并且会更昂贵。
  • 由于您的主要 ORDER BY 项目是n,因此对查询的各个部分进行排序可能更有效。为此需要一组额外的括号。
  • left (plate, 2)比 快一点substr(plate, 1, 2)。仅适用于前导子字符串(您的第一个SELECT)。

Index

指数

A default B-tree index only works for left-anchored LIKEexpressions. But a trigram GiST or GIN indexcan be used for non-left-anchored patterns. You need the additional module pg_trgm. Install once per database with CREATE EXTENSIONin PostgreSQL 9.1 or later. Consult the manual for older versions.

默认的B 树索引仅适用于左锚定LIKE表达式。但是三元组 GiST 或 GIN 索引可用于非左锚定模式。您需要附加模块pg_trgmCREATE EXTENSION在 PostgreSQL 9.1 或更高版本中为每个数据库安装一次。查阅旧版本的手册。

CREATE EXTENSION pg_trgm;

I don't have much information to go on, basic partial GIN indexesshould work wonders:

我没有太多信息要讲,基本的部分 GIN 索引应该会产生奇迹

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z';

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z';

-- more ...
  • I didn't include vinin the index, since you probably want the equality operator =there.
  • Predicates on a partial index have to be repeated (more or less) in queries so the query planner understands the index is applicable.
  • A trigram index works for case insensitive matches.
  • Test with EXPLAIN ANALYZEwhether the index is actually used. If it is, query time should be a matter of milliseconds, not seconds.
  • The speed comes at a (small) cost for write operations for index maintenance. And The index is typically several times the size of the table on disk.
  • You can't do any of this with MySQL.
  • 我没有包含vin在索引中,因为您可能希望在=那里使用相等运算符。
  • 部分索引上的谓词必须在查询中重复(或多或少),以便查询规划器了解该索引是适用的。
  • 三元组索引适用于不区分大小写的匹配。
  • 测试EXPLAIN ANALYZE是否实际使用了索引。如果是,查询时间应该是毫秒,而不是秒。
  • 对于索引维护的写入操作,速度会带来(小)成本。并且索引通常是磁盘上表大小的几倍。
  • 你不能用 MySQL 做任何这些。

回答by intgr

You still haven't provided enough information -- what indexes do you have, EXPLAIN ANALYZE output for slow queries, etc.

你仍然没有提供足够的信息——你有什么索引,慢查询的 EXPLAIN ANALYZE 输出等。

Some thoughts on optimizing your example query:

关于优化示例查询的一些想法:

1: UTF-8 string functions are generally not very fast. If you want to speed up string functions, use the byteatype instead of varchar for this column (or change your whole database encoding to SQL_ASCII, but this is unadvisable)

1:UTF-8字符串函数一般速度不是很快。如果要加速字符串函数,请bytea为此列使用类型而不是 varchar(或将整个数据库编码更改为SQL_ASCII,但这是不可取的)

2: Given your queries, the database probably has to go through allrows in the table and compute these string functions for each.

2:根据您的查询,数据库可能必须遍历表中的所有行并为每个行计算这些字符串函数。

I don't know how many matches they have, so the index might not be useful, but functional indexes might help you out:

我不知道他们有多少匹配项,所以索引可能没有用,但功能索引可能会帮助你:

 CREATE INDEX imtt_vin_plate_1 ON imtt_vin (substr(plate,1,1));
 CREATE INDEX imtt_vin_plate_4 ON imtt_vin (substr(plate,4,1));
 CREATE INDEX imtt_vin_plate_7 ON imtt_vin (substr(plate,7,1));

3: If you can tolerate duplicate outputs, use UNION ALLinstead of UNIONin your queries -- this will save you some processing with larger result sets.

3:如果您可以容忍重复输出,请在查询中使用UNION ALL而不是UNION- 这将为您节省一些处理较大结果集的时间。

4: Avoid LIKE/ILIKEwhenever you can.

4:尽可能避免LIKE/ ILIKE