postgresql 如何在正则表达式上连接表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8679079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to join tables on regex
提问by z4y4ts
Say I have two tables msg for messages and mnc for mobile network codes. They share no relations. But I want to join them
假设我有两个表 msg 用于消息,而 mnc 用于移动网络代码。他们没有任何关系。但我想加入他们
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name,
FROM "msg"
JOIN "mnc"
ON array_to_string(regexp_matches(msg.src_addr || '+' || msg.dst_addr, '38(...)'), '') = mnc.code
But query fails with error:
但查询失败并出现错误:
psql:marketing.sql:28: ERROR: argument of JOIN/ON must not return a set
LINE 12: ON array_to_string(regexp_matches(msg.src_addr || '+' || msg...
Is there a way to do such join? Or am I moving wrong way?
有没有办法做这样的加入?还是我走错路了?
回答by Erwin Brandstetter
As @Milen already mentioned regexp_matches()
is probably the wrong function for your purpose. You want a simple regular expression match (~
). Actually, the LIKE operator (~~
)will be faster:
正如@Milen 已经提到的那样,regexp_matches()
对于您而言可能是错误的功能。您需要一个简单的正则表达式 match ( ~
)。实际上,LIKE 运算符 ( ~~
)会更快:
Presumably fastest with LIKE
大概用 LIKE 最快
SELECT msg.message
,msg.src_addr
,msg.dst_addr
,mnc.name
FROM mnc
JOIN msg ON msg.src_addr ~~ ('%38' || mnc.code || '%')
OR msg.dst_addr ~~ ('%38' || mnc.code || '%')
WHERE length(mnc.code) = 3
In addition, you only want mnc.code
of exactly 3 characters.
此外,您只需要mnc.code
3 个字符。
With regexp
使用正则表达式
You couldwrite the same with regular expressions but it will most definitely be slower. Here is a working example close to your original:
您可以使用正则表达式编写相同的内容,但它肯定会更慢。这是一个接近您原件的工作示例:
SELECT msg.message
,msg.src_addr
,msg.dst_addr
,mnc.name
FROM mnc
JOIN msg ON (msg.src_addr || '+' || msg.dst_addr) ~ (38 || mnc.code)
AND length(mnc.code) = 3
This also requires msg.src_addr
and msg.dst_addr
to be NOT NULL
.
这也需要msg.src_addr
并且msg.dst_addr
是NOT NULL
。
The second query demonstrates how the additional check length(mnc.code) = 3
can go into the JOIN
condition or a WHERE
clause. Same effect here.
第二个查询演示了附加检查如何length(mnc.code) = 3
进入JOIN
条件或WHERE
子句。这里效果一样。
With regexp_matches()
使用 regexp_matches()
You couldmake this work with regexp_matches()
:
您可以使用以下方法完成这项工作regexp_matches()
:
SELECT msg.message
,msg.src_addr
,msg.dst_addr
,mnc.name
FROM mnc
JOIN msg ON EXISTS (
SELECT *
FROM regexp_matches(msg.src_addr ||'+'|| msg.dst_addr, '38(...)', 'g') x(y)
WHERE y[1] = mnc.code
)
But it will be slow in comparison - or so I assume.
但相比之下它会很慢 - 或者我假设。
Explanation:
Your regexp_matches() expression just returns an array of all captured substrings of the firstmatch. As you only capture one substring (one pair of brackets in your pattern), you will exclusively get arrays with one element.
说明:
您的 regexp_matches() 表达式仅返回第一个匹配项的所有捕获子字符串的数组。由于您只捕获一个子字符串(您的模式中的一对括号),您将只获得一个元素的数组。
You get all matcheswith the additional "globally" switch 'g'
- but in multiple rows. So you need a sub-select to test them all (or aggregate). Put that in an EXISTS
- semi-join and you arrive at what you wanted.
您可以通过附加的“全局”开关获得所有匹配项'g'
- 但在多行中。所以你需要一个子选择来测试它们(或聚合)。把它放在一个EXISTS
- 半连接中,你就会得到你想要的。
Maybe you can report back with a performance testof all three? Use EXPLAIN ANALYZEfor that.
也许您可以通过对所有三个的性能测试来报告?为此使用EXPLAIN ANALYZE。
回答by Milen A. Radev
Your immediate problem is that regexp_matches
could return one or more rows.
您当前的问题是regexp_matches
可能返回一行或多行。
回答by small_duck
Try using "substring" instead, which extracts a substring given a regex pattern.
尝试改用“子字符串”,它会提取给定正则表达式模式的子字符串。
SELECT msg.message,
msg.src_addr,
msg.dst_addr,
mnc.name
FROM "msg"
JOIN "mnc"
ON substring(msg.src_addr || '+' || msg.dst_addr from '38(...)') = mnc.code