MySQL 从记录中删除 HTML 标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2627940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:46:26  来源:igfitidea点击:

Remove HTML tags from record

mysql

提问by Subha

Need help to form the MYSQL query from table one column having the bellow content

需要帮助从具有以下内容的表一列中形成 MYSQL 查询

Row1 : this is first <a href='mytext.txt'>row</a> from the table

Row1 : this is first <a href='mytext.txt'>row</a> from the table

Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there

Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there

Row 3 : <p>This is the Third row my mytext is there </p>

Row 3 : <p>This is the Third row my mytext is there </p>

Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>

Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>

this is the table rows i try to search the keyword as 'mytext'

这是我尝试将关键字搜索为“mytext”的表格行

my query is

我的查询是

SELECT * from table  WHERE colmn_name ` like '%mytext%' "

I will get all the 4 rows as result but the result is wrong. I need to get the correct output as only Row 3. The reason this row only having the mytext inside the content all other are not in content but mytext having in all rows

我将得到所有 4 行作为结果,但结果是错误的。我需要获得正确的输出,因为只有第 3 行。这一行只有在内容中包含 mytext 的原因所有其他不在内容中,而是在所有行中包含 mytext

How can I write the MySQL query?

如何编写 MySQL 查询?

回答by KB.

try this solution: not tried it myself but apparently it works.

试试这个解决方案:我自己没有尝试过,但显然它有效。

source: http://forums.mysql.com/read.php?52,177343,177985#msg-177985

来源:http: //forums.mysql.com/read.php?52,177343,177985#msg-177985

   SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength int;
    WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
      BEGIN
        SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
        SET iLength = ( iEnd - iStart) + 1;
        IF iLength > 0 THEN
          BEGIN
            SET Dirty = Insert( Dirty, iStart, iLength, '');
          END;
        END IF;
      END;
    END WHILE;
    RETURN Dirty;
END;
|
DELIMITER ;
SELECT fnStripTags('this <html>is <b>a test</b>, nothing more</html>');

回答by Boann

Here's my implementation of a strip_tags function:

这是我对 strip_tags 函数的实现:

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

我确保它删除了不匹配的左括号,因为它们很危险,尽管它忽略了任何不成对的右括号,因为它们是无害的。

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set

Enjoy.

享受。

回答by Ryan Ou

If your content always start with tags(<body> etc.)

如果您的内容总是以标签开头(<body> 等)

try this:

尝试这个:

SELECT * from table  WHERE colmn_name REGEXP  '>[^<]*mytext';

回答by ZAky

Add these lines to fnStripTagsfunction
after SET Dirty = Insert( Dirty, iStart, iLength, '');

将这些行添加到fnStripTags函数
之后SET Dirty = Insert( Dirty, iStart, iLength, '');

set Dirty = Replace(Dirty,'&nbsp;',''); #No space between & and nbsp;
set Dirty = Replace(Dirty,'\r','');
set Dirty = Replace(Dirty,'\n','');

回答by cucu8

I used strip_tags() above (Thanks Boann) with a slight mod to be rid of the html codes of accented chars etc. Like so:

我在上面使用了 strip_tags()(感谢 Boann),并稍加修改以去除重音字符等的 html 代码。像这样:

...BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = REPLACE($str, "&nbsp;", " ");
    SET $str = REPLACE($str, "&euro;", "");
    SET $str = REPLACE($str, "&aacute;", "á");
    SET $str = REPLACE($str, "&eacute;", "é");
    SET $str = REPLACE($str, "&iacute;", "í");
    SET $str = REPLACE($str, "&oacute;", "ó");
    SET $str = REPLACE($str, "&uacute;", "ú");
LOOP...

回答by ajmedway

MySQL strip tags implementation, allowing you to target a specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a'to replace out all opening/closing anchor tags.

MySQL 条带标签实现,允许您针对特定标签,以便我们可以在每次函数调用时一一替换标签。您只需要传递 tag 参数,例如'a'替换所有打开/关闭锚标签。

# MySQL function to programmatically replace out specified html tags from text/html fields

# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;

DELIMITER |

# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
    LOOP
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
END;

| DELIMITER ;

# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
FROM
    tmpcat;

# run update query to replace out all <a> tags
UPDATE tmpcat
SET 
    description = STRIP_TAGS(description, 'a');

回答by Rembert Oldenboom

Expanded the strip_tags function as provided by Boann. It now can be used to either keep or ignore the phrase between the tags. Note the bug with empty tags in my part of the code, ie. for $keep_phrase = false.

扩展了 Boann 提供的 strip_tags 函数。它现在可用于保留或忽略标签之间的短语。请注意我的代码部分中带有空标签的错误,即。对于 $keep_phrase = false。

    CREATE FUNCTION strip_tags($str text, $tag text,$keep_phrase bool) RETURNS text
    BEGIN
        DECLARE $start, $end INT DEFAULT 1;
        SET $str = COALESCE($str, '');
        LOOP
            SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
            IF (!$start) THEN RETURN $str; END IF;
            IF ($keep_phrase) THEN
                SET $end = LOCATE('>', $str, $start);
                IF (!$end) THEN SET $end = $start; END IF;
                SET $str = INSERT($str, $start, $end - $start + 1, '');
                SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
            ELSE
                SET $end = LOCATE(CONCAT('</', $tag, '>'),$str,$start);
                IF (!$end) THEN 
                    SET $end = LOCATE('/>',$str,$start); 
                    SET $str = INSERT($str, $start, $end - $start + 2, '');
                ELSE 
                    SET $str = INSERT($str, $start, $end - $start 
                       + LENGTH(CONCAT('</', $tag, '>')), '');
                END IF;
            END IF;
        END LOOP;
    END //

To prove the functionality:

为了证明功能:

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',true);
    <p>so very cool</p><p>so very cool</p>

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',false);
    <p>so cool</p><p>so cool</p>

Empty elements with $keep_phrase = false is not supported, see:

不支持 $keep_phrase = false 的空元素,请参阅:

    SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><span/><p>so<span id="y"> very</span> cool</p>','span',false);
    <p>so cool</p> cool</p>

回答by Rembert Oldenboom

Regular expression matching is needed here, with negative look-ahead assertion: "mytext" not followed by a closing tag. MySQL still, as of 8.0, doesn't support look assertions, but MariaDB does. MariaDB query for the question:

这里需要正则表达式匹配,带有否定的前瞻断言:“mytext”后面没有结束标记。从 8.0 开始,MySQL 仍然不支持外观断言,但 MariaDB 支持。MariaDB 查询问题:

SELECT * FROM table WHERE column_name REGEXP 'mytext(?![^<>]*>)';

Another solution for the question is to remove some / all tags before matching. It's inefficient, compared with REGEXP, but also works. In MySQL starting with 8.0 and MariaDB starting with 10.0.5, there is built-in REGEXP_REPLACEfunction. 'strip_html' is even the first example in MariaDB corresponding documentation page. MySQL / MariaDB query for such approach:

该问题的另一个解决方案是在匹配之前删除一些/所有标签。与 REGEXP 相比,它效率低下,但也有效。MySQL 从 8.0 开始,MariaDB 从 10.0.5 开始,有内置REGEXP_REPLACE函数。'strip_html' 甚至是 MariaDB 相应文档页面中的第一个示例。MySQL / MariaDB 查询这种方法:

SELECT * FROM table WHERE REGEXP_REPLACE (column_name, '<.+?>', '') LIKE '%mytext%';

Apart from that, strings in the question mix data and representation. They shouldn't be regularly searched as it is a waste of system resources.

除此之外,问题中的字符串混合了数据和表示。不应定期搜索它们,因为这会浪费系统资源。

回答by Egor Pavlikhin

You cannot parse HTML inside of SQL queries, that doesn't make any sense. Perhaps you could just maintain a special search version of the table with all the HTML stripped, but you would have to use some external processing to do that.

您不能在 SQL 查询中解析 HTML,这没有任何意义。也许您可以只保留一个删除所有 HTML 的表格的特殊搜索版本,但是您必须使用一些外部处理来做到这一点。