SQL 计算 PostgreSQL 中字符串中子字符串的出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36376410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:31:52  来源:igfitidea点击:

Counting the number of occurrences of a substring within a string in PostgreSQL

sqlstringpostgresql

提问by Franck Dernoncourt

How can I count the number of occurrences of a substring within a string in PostgreSQL?

如何计算 PostgreSQL 中字符串中子字符串的出现次数?



Example:

例子:

I have a table

我有一张桌子

CREATE TABLE test."user"
(
  uid integer NOT NULL,
  name text,
  result integer,
  CONSTRAINT pkey PRIMARY KEY (uid)
)

I want to write a query so that the resultcontains column how many occurrences of the substring othe column namecontains. For instance, if in one row, nameis hello world, the column resultshould contain 2, since there are two oin the string hello world.

我想编写一个查询,以便result包含列o该列name包含多少次出现的子字符串。例如,如果在一行中namehello world,则该列result应该包含2,因为o字符串中有两个hello world

In other words, I'm trying to write a query that would take as input:

换句话说,我正在尝试编写一个作为输入的查询:

enter image description here

在此处输入图片说明

and update the resultcolumn:

并更新result列:

enter image description here

在此处输入图片说明



I am aware of the function regexp_matchesand its goption, which indicates that the full (g= global) string needs to be scanned for the presence of all occurrences of the substring).

我知道该函数regexp_matches及其g选项,这表明g需要扫描完整(= 全局)字符串以查找所有出现的子字符串的存在。

Example:

例子:

SELECT * FROM regexp_matches('hello world', 'o', 'g');

returns

回报

{o}
{o}

and

SELECT COUNT(*)  FROM regexp_matches('hello world', 'o', 'g');

returns

回报

2

But I don't see how to write an UPDATEquery that would update the resultcolumn in such a way that it would contain how many occurrences of the substring o the column namecontains.

但是我不知道如何编写一个UPDATE查询来更新result列,以便它包含列name包含的子字符串的出现次数。

回答by dnoeth

A common solution is based on this logic: replace the search string with an empty string and divide the difference between old and new length by the length of the search string

一个常见的解决方案是基于这样的逻辑:将搜索字符串替换为空字符串,并将新旧长度的差除以搜索字符串的长度

(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'substring', ''))) 
/ CHAR_LENGTH('substring')

Hence:

因此:

UPDATE test."user"
SET result = 
    (CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'o', ''))) 
    / CHAR_LENGTH('o');

回答by Gordon Linoff

A Postgres'y way of doing this converts the string to an array and counts the length of the array (and then subtracts 1):

Postgres'y 这样做的方法是将字符串转换为数组并计算数组的长度(然后减去 1):

select array_length(string_to_array(name, 'o'), 1) - 1

Note that this works with longer substrings as well.

请注意,这也适用于更长的子字符串。

Hence:

因此:

update test."user"
    set result = array_length(string_to_array(name, 'o'), 1) - 1;

回答by bnson

Other way:

另一种方式:

UPDATE test."user" SET result = length(regexp_replace(name, '[^o]', '', 'g'));

回答by Robert Bondy

Occcurence_Count = LENGTH(REPLACE(string_to_search,string_to_find,'~'))-LENGTH(REPLACE(string_to_search,string_to_find,''))

This solution is a bit cleaner than many that I have seen, especially with no divisor. You can turn this into a function or use within a Select.
No variables required. I use tilde as a replacement character, but any character that is not in the dataset will work.

这个解决方案比我见过的许多解决方案更清晰,尤其是没有除数。您可以将其转换为函数或在 Select 中使用。
不需要变量。我使用波浪号作为替换字符,但任何不在数据集中的字符都可以使用。

回答by Guilherme Passos

Return count of character,

返回字符数,

 SELECT (LENGTH('1.1.1.1') - LENGTH(REPLACE('1.1.1.1','.',''))) AS count
--RETURN COUNT OF CHARACTER '.'