PostgreSQL 计算子字符串在文本中出现的次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25757194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:36:38  来源:igfitidea点击:

PostgreSQL count number of times substring occurs in text

sqlpostgresql

提问by Tony Duan

I'm writing a PostgreSQL function to count the number of times a particular text substring occurs in another piece of text. For example, calling count('foobarbaz', 'ba') should return 2.

我正在编写一个 PostgreSQL 函数来计算特定文本子字符串在另一段文本中出现的次数。例如,调用 count('foobarbaz', 'ba') 应该返回 2。

I understand that to test whether the substring occurs, I use a condition similar to the below:

我知道为了测试子字符串是否出现,我使用了类似于以下的条件:

    WHERE 'foobarbaz' like '%ba%'

However, I need it to return 2 for the number of times 'ba' occurs. How can I proceed?

但是,我需要它为 'ba' 出现的次数返回 2。我该如何继续?

Thanks in advance for your help.

在此先感谢您的帮助。

回答by Mike T

How about use a regular expression:

如何使用正则表达式:

SELECT count(*)
FROM regexp_matches('foobarbaz', 'ba', 'g');

The 'g'flag repeats multiple matches on a string (not just the first).

'g'标志在一个字符串上重复多次匹配(不仅仅是第一个)。

回答by Evan Carroll

I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace(). The overhead of creating the rows, and the running the aggregate is just simply too high.

我强烈建议您查看我发布到“您如何使用 PostgreSQL 计算锚定字符串的出现次数?”的答案. 所选择的答案被证明比regexp_replace(). 创建行和运行聚合的开销实在是太高了。

The fastest way to do this is as follows...

最快的方法如下...

SELECT
  (length(str) - length(replace(str, replacestr, '')) )::int
  / length(replacestr)
FROM ( VALUES
  ('foobarbaz', 'ba')
) AS t(str, replacestr);

Here we

在这里,我们

  1. Take the length of the string, L1
  2. Subtract from L1the length of the string with all of the replacements removed L2to get L3the difference in string length.
  3. Divide L3by the length of the replacement to get the occurrences
  1. 取字符串的长度, L1
  2. L1去除所有替换的字符串长度中减去L2以获得L3字符串长度的差异。
  3. 除以L3替换的长度以获得出现次数

For comparison that's about five times fasterthan the method of using regexp_matches()which looks like this.

为了比较,这比看起来像这样的使用方法快五倍regexp_matches()

SELECT count(*)
FROM ( VALUES
  ('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');

回答by Andreas Dietrich

There is a

有一个

str_count( src,  occurence )

function based on

功能基于

SELECT (length( str ) - length(replace( str, occurrence, '' ))) / length( occurence )

and a

和一个

str_countm( src, regexp )

based on the @MikeT-mentioned

基于该@MikeT-mentioned

SELECT count(*) FROM regexp_matches( str, regexp, 'g')

available here: postgres-utils

此处可用:postgres-utils

回答by atiruz

Try with:

尝试:

SELECT array_length (string_to_array ('1524215121518546516323203210856879', '1'), 1) - 1

--RESULT: 7