如何提取 MySQL 字符串中的第 n 个单词并计算单词出现次数？

Question

提问by Noam

I would like to have a mysql query like this:

我想要一个像这样的 mysql 查询：

select <second word in text> word, count(*) from table group by word;

All the regex examples in mysql are used to query if the text matches the expression, but not to extract text out of an expression. Is there such a syntax?

mysql 中的所有正则表达式示例都是用于查询文本是否与表达式匹配，而不是从表达式中提取文本。有这样的语法吗？

Answer 1

回答by Brendan Bullen

The following is a proposed solution for the OP's specificproblem (extracting the 2nd word of a string), but it should be noted that, as mc0e's answer states, actually extracting regex matches is not supported out-of-the-box in MySQL. If you really need this, then your choices are basically to 1) do it in post-processing on the client, or 2) install a MySQL extension to support it.

以下是针对 OP特定问题（提取字符串的第二个单词）的建议解决方案，但应注意，正如 mc0e 的回答所述，MySQL 中不支持开箱即用地实际提取正则表达式匹配项。如果你真的需要这个，那么你的选择基本上是 1) 在客户端的后处理中进行，或者 2) 安装一个 MySQL 扩展来支持它。

BenWells has it very almost correct. Working from his code, here's a slightly adjusted version:

BenWells 的说法几乎是正确的。根据他的代码，这里有一个稍微调整的版本：

SUBSTRING(
  sentence,
  LOCATE(' ', sentence) + CHAR_LENGTH(' '),
  LOCATE(' ', sentence,
  ( LOCATE(' ', sentence) + 1 ) - ( LOCATE(' ', sentence) + CHAR_LENGTH(' ') )
)

As a working example, I used:

作为一个工作示例，我使用了：

SELECT SUBSTRING(
  sentence,
  LOCATE(' ', sentence) + CHAR_LENGTH(' '),
  LOCATE(' ', sentence,
  ( LOCATE(' ', sentence) + 1 ) - ( LOCATE(' ', sentence) + CHAR_LENGTH(' ') )
) as string
FROM (SELECT 'THIS IS A TEST' AS sentence) temp

This successfully extracts the word IS

这成功提取了单词 IS

Answer 2

回答by Damien Goor

Shorter option to extract the second word in a sentence:

提取句子中第二个单词的较短选项：

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('THIS IS A TEST', ' ',  2), ' ', -1) as FoundText

MySQL docs for SUBSTRING_INDEX

SUBSTRING_INDEX 的 MySQL 文档

Answer 3

回答by BenWells

According to http://dev.mysql.com/the SUBSTRING function uses start position then the length so surely the function for the second word would be:

根据http://dev.mysql.com/，SUBSTRING函数使用起始位置，那么长度肯定是第二个单词的函数：

SUBSTRING(sentence,LOCATE(' ',sentence),(LOCATE(' ',LOCATE(' ',sentence))-LOCATE(' ',sentence)))

Answer 4

回答by Mark Byers

No, there isn't a syntax for extracting text using regular expressions. You have to use the ordinary string manipulation functions.

不，没有使用正则表达式提取文本的语法。您必须使用普通的字符串操作函数。

Alternatively select the entire value from the database (or the first n characters if you are worried about too much data transfer) and then use a regular expression on the client.

或者，从数据库中选择整个值（如果您担心数据传输过多，则选择前 n 个字符），然后在客户端上使用正则表达式。

Answer 5

回答by mc0e

As others have said, mysql does not provide regex tools for extracting sub-strings. That's not to say you can't have them though if you're prepared to extend mysql with user-defined functions:

正如其他人所说，mysql 不提供用于提取子字符串的正则表达式工具。这并不是说如果您准备使用用户定义的函数扩展 mysql，您就不能拥有它们：

https://github.com/mysqludf/lib_mysqludf_preg

That may not be much help if you want to distribute your software, being an impediment to installing your software, but for an in-house solution it may be appropriate.

如果您想分发您的软件，这可能不会有太大帮助，因为这会妨碍您安装软件，但对于内部解决方案，它可能是合适的。

Answer 6

回答by Hypolite Petovan

I used Brendan Bullen's answer as a starting point for a similar issue I had which was to retrive the value of a specific field in a JSON string. However, like I commented on his answer, it is not entirely accurate. If your left boundary isn't just a space like in the original question, then the discrepancy increases.

我使用 Brendan Bullen 的答案作为我遇到的类似问题的起点，该问题是检索 JSON 字符串中特定字段的值。但是，就像我评论他的回答一样，它并不完全准确。如果您的左边界不仅仅是原始问题中的空间，则差异会增加。

Corrected solution:

更正的解决方案：

SUBSTRING(
    sentence,
    LOCATE(' ', sentence) + 1,
    LOCATE(' ', sentence, (LOCATE(' ', sentence) + 1)) - LOCATE(' ', sentence) - 1
)

The two differences are the +1 in the SUBSTRING index parameter and the -1 in the length parameter.

两者的区别是 SUBSTRING 索引参数中的 +1 和长度参数中的 -1。

For a more general solution to "find the first occurence of a string between two provided boundaries":

对于“在两个提供的边界之间找到字符串的第一次出现”的更通用的解决方案：

SUBSTRING(
    haystack,
    LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>'),
    LOCATE(
        '<rightBoundary>',
        haystack,
        LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>')
    )
    - (LOCATE('<leftBoundary>', haystack) + CHAR_LENGTH('<leftBoundary>'))
)

Answer 7

回答by user483085

I don't think such a thing is possible. You can use SUBSTRINGfunction to extract the part you want.

我不认为这样的事情是可能的。您可以使用SUBSTRING函数来提取您想要的部分。

Answer 8

回答by Steve Chambers

My home-grown regular expression replace functioncan be used for this.

我自己开发的正则表达式替换函数可用于此目的。

Demo

演示

See this DB-Fiddle demo, which returns the second word ("I") from a famous sonnet and the number of occurrences of it (1).

请参阅此 DB-Fiddle 演示，它返回一首著名十四行诗中的第二个单词 ("I") 及其出现次数 (1)。

SQL

Assuming MySQL 8 or later is being used (to allow use of a Common Table Expression), the following will return the second word and the number of occurrences of it:

假设使用 MySQL 8 或更高版本（以允许使用公共表表达式），以下将返回第二个单词及其出现次数：

WITH cte AS (
     SELECT digits.idx,
            SUBSTRING_INDEX(SUBSTRING_INDEX(words, '~', digits.idx + 1), '~', -1) word
     FROM
     (SELECT reg_replace(UPPER(txt),
                         '[^'''a-zA-Z-]+',
                         '~',
                         TRUE,
                         1,
                         0) AS words
      FROM tbl) delimited
     INNER JOIN
     (SELECT @row := @row + 1 as idx FROM 
      (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t1,
      (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t2, 
      (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t3, 
      (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t4, 
      (SELECT @row := -1) t5) digits
     ON LENGTH(REPLACE(words, '~' , '')) <= LENGTH(words) - digits.idx)
SELECT c.word,
       subq.occurrences
FROM cte c
LEFT JOIN (
  SELECT word,
         COUNT(*) AS occurrences
  FROM cte
  GROUP BY word
) subq
ON c.word = subq.word
WHERE idx = 1; /* idx is zero-based so 1 here gets the second word */

Explanation

解释

A few tricks are used in the SQL above and some accreditation is needed. Firstly the regular expression replacer is used to replace all continuous blocks of non-word characters - each being replaced by a single tilda (~) character. Note: A different character could be chosen instead if there is any possibility of a tilda appearing in the text.

上面的 SQL 中使用了一些技巧，需要一些认证。首先，正则表达式替换器用于替换所有连续的非单词字符块 - 每个块都被单个 tilda ( ~) 字符替换。注意：如果文本中可能出现波浪号，则可以选择不同的字符。

The technique from this answeris then used for transforming a string with delimited values into separate row values. It's combined with the clever technique from this answerfor generating a table consisting of a sequence of incrementing numbers: 0 - 10,000 in this case.

然后使用此答案中的技术将具有分隔值的字符串转换为单独的行值。它与此答案中的巧妙技术相结合，用于生成由一系列递增数字组成的表格：在本例中为 0 - 10,000。

Answer 9

回答by Antonio Rivera

The field's value is:

该字段的值为：

 "- DE-HEB 20% - DTopTen 1.2%"
SELECT ....
SUBSTRING_INDEX(SUBSTRING_INDEX(DesctosAplicados, 'DE-HEB ',  -1), '-', 1) DE-HEB ,
SUBSTRING_INDEX(SUBSTRING_INDEX(DesctosAplicados, 'DTopTen ',  -1), '-', 1) DTopTen ,

FROM TABLA

Result is:

结果是：

  DE-HEB       DTopTEn
    20%          1.2%

如何提取 MySQL 字符串中的第 n 个单词并计算单词出现次数？

提问by Noam

回答by Brendan Bullen

回答by Damien Goor

回答by BenWells

回答by Mark Byers

回答by mc0e

回答by Hypolite Petovan

回答by user483085

回答by Steve Chambers

回答by Antonio Rivera

相关推荐

最近更新

标签

如何提取 MySQL 字符串中的第 n 个单词并计算单词出现次数？

提问by Noam

回答by Brendan Bullen

回答by Damien Goor

回答by BenWells

回答by Mark Byers

回答by mc0e

回答by Hypolite Petovan

回答by user483085

回答by Steve Chambers

回答by Antonio Rivera

相关推荐

MySQL MySQL使用多列选择重复记录

MySQL 使用 mysqldump 和数据库用户

MySQL SQL：如何执行字符串不等于

在 MySQL 工作台中，连接的用户名/密码是什么？

相关推荐

最近更新

标签