如何在 MySQL / 正则表达式替换器中计算单词?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1755408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count words in MySQL / regular expression replacer?
提问by PierrOz
How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?
我如何在 MySQL 查询中具有与 Regex.Replace 函数相同的行为(例如在 .NET/C# 中)?
I need that because, as many people, I would like to count the number of words in a field. However, I'm not satisfied with the following answer (given several times on that site):
我需要它,因为和很多人一样,我想计算一个字段中的单词数。但是,我对以下答案不满意(在该网站上多次给出):
SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table
Because it doesn't give good results when there are more that one space between two words.
因为当两个单词之间的空格超过一个时,它不会给出好的结果。
By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !
顺便说一句,我认为 Regex.Replace 函数可能很有趣,所以欢迎所有好的想法!
回答by laalto
There's REGEXP_REPLACE available as MySQL user-defined functions.
有 REGEXP_REPLACE 可用作MySQL 用户定义函数。
Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.
字数统计:如果可以控制进入数据库的数据,可以在插入前去掉双空格。此外,如果您必须经常访问字数统计,您可以在代码中计算一次并将计数存储在数据库中。
回答by Steve Chambers
UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)
更新:现在为 MySQL 8.0+添加了一个单独的答案,应该优先使用。(保留此答案以防被限制使用早期版本。)
Almost a duplicate of this questionbut this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.
几乎是此问题的副本,但此答案将解决基于此博客文章中自定义正则表达式替换器的高级版本计算单词的用例。
Demo
演示
For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).
对于示例文本,这给出了 61 的计数 - 与我尝试过的所有在线单词计数器相同(例如https://wordcounter.net/)。
SQL (excluding function code for brevity):
SQL (为简洁起见,不包括函数代码):
SELECT txt,
-- Count the number of gaps between words
CHAR_LENGTH(txt) -
CHAR_LENGTH(reg_replace(txt,
'[[:space:]]+', -- Look for a chunk of whitespace
'^.', -- Replace the first character from the chunk
'', -- Replace with nothing (i.e. remove the character)
TRUE, -- Greedy matching
1, -- Minimum match length
0, -- No maximum match length
1, -- Minimum sub-match length
0 -- No maximum sub-match length
))
+ 1 -- The word count is 1 more than the number of gaps between words
- IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
- IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
AS `word count`
FROM tbl;
回答by Steve Chambers
MySQL 8.0 now provides a decent REGEXP_REPLACEfunction, which makes this much simpler:
MySQL 8.0 现在提供了一个不错的REGEXP_REPLACE函数,这使得这更简单:
SQL
SQL
SELECT -- Count the number of gaps between words
CHAR_LENGTH(txt) -
CHAR_LENGTH(REGEXP_REPLACE(
txt,
'[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
'')) -- Discard the first whitespace character and retain the rest
+ 1 -- The word count is 1 more than the number of gaps between words
- IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
- IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
AS `Word count`
FROM tbl;
Demo
演示