MySQL 从记录中删除 HTML 标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2627940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove HTML tags from record
提问by Subha
Need help to form the MYSQL query from table one column having the bellow content
需要帮助从具有以下内容的表一列中形成 MYSQL 查询
Row1 : this is first <a href='mytext.txt'>row</a> from the table
Row1 : this is first <a href='mytext.txt'>row</a> from the table
Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there
Row 2 : THis is the second row <img src ='mytext.jpg'> my image is there
Row 3 : <p>This is the Third row my mytext is there </p>
Row 3 : <p>This is the Third row my mytext is there </p>
Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>
Row 4 : <p class='te_mytext'>This is the Third row my text is there </p>
this is the table rows i try to search the keyword as 'mytext'
这是我尝试将关键字搜索为“mytext”的表格行
my query is
我的查询是
SELECT * from table WHERE colmn_name ` like '%mytext%' "
I will get all the 4 rows as result but the result is wrong. I need to get the correct output as only Row 3. The reason this row only having the mytext inside the content all other are not in content but mytext having in all rows
我将得到所有 4 行作为结果,但结果是错误的。我需要获得正确的输出,因为只有第 3 行。这一行只有在内容中包含 mytext 的原因所有其他不在内容中,而是在所有行中包含 mytext
How can I write the MySQL query?
如何编写 MySQL 查询?
回答by KB.
try this solution: not tried it myself but apparently it works.
试试这个解决方案:我自己没有尝试过,但显然它有效。
source: http://forums.mysql.com/read.php?52,177343,177985#msg-177985
来源:http: //forums.mysql.com/read.php?52,177343,177985#msg-177985
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC
BEGIN
DECLARE iStart, iEnd, iLength int;
WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
BEGIN
SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
SET iLength = ( iEnd - iStart) + 1;
IF iLength > 0 THEN
BEGIN
SET Dirty = Insert( Dirty, iStart, iLength, '');
END;
END IF;
END;
END WHILE;
RETURN Dirty;
END;
|
DELIMITER ;
SELECT fnStripTags('this <html>is <b>a test</b>, nothing more</html>');
回答by Boann
Here's my implementation of a strip_tags function:
这是我对 strip_tags 函数的实现:
CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
LOOP
SET $start = LOCATE("<", $str, $start);
IF (!$start) THEN RETURN $str; END IF;
SET $end = LOCATE(">", $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, "");
END LOOP;
END;
I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.
我确保它删除了不匹配的左括号,因为它们很危险,尽管它忽略了任何不成对的右括号,因为它们是无害的。
mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again. |
+----------------------------------------------------------------------+
1 row in set
Enjoy.
享受。
回答by Ryan Ou
If your content always start with tags(<body> etc.)
如果您的内容总是以标签开头(<body> 等)
try this:
尝试这个:
SELECT * from table WHERE colmn_name REGEXP '>[^<]*mytext';
回答by ZAky
Add these lines to fnStripTags
function
after SET Dirty = Insert( Dirty, iStart, iLength, '');
将这些行添加到fnStripTags
函数
之后SET Dirty = Insert( Dirty, iStart, iLength, '');
set Dirty = Replace(Dirty,' ',''); #No space between & and nbsp;
set Dirty = Replace(Dirty,'\r','');
set Dirty = Replace(Dirty,'\n','');
回答by cucu8
I used strip_tags() above (Thanks Boann) with a slight mod to be rid of the html codes of accented chars etc. Like so:
我在上面使用了 strip_tags()(感谢 Boann),并稍加修改以去除重音字符等的 html 代码。像这样:
...BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = REPLACE($str, " ", " ");
SET $str = REPLACE($str, "€", "");
SET $str = REPLACE($str, "á", "á");
SET $str = REPLACE($str, "é", "é");
SET $str = REPLACE($str, "í", "í");
SET $str = REPLACE($str, "ó", "ó");
SET $str = REPLACE($str, "ú", "ú");
LOOP...
回答by ajmedway
MySQL strip tags implementation, allowing you to target a specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a'
to replace out all opening/closing anchor tags.
MySQL 条带标签实现,允许您针对特定标签,以便我们可以在每次函数调用时一一替换标签。您只需要传递 tag 参数,例如'a'
替换所有打开/关闭锚标签。
# MySQL function to programmatically replace out specified html tags from text/html fields
# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;
DELIMITER |
# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = COALESCE($str, '');
LOOP
SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
IF (!$start) THEN RETURN $str; END IF;
SET $end = LOCATE('>', $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, '');
SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
END LOOP;
END;
| DELIMITER ;
# test select to nuke all opening <a> tags
SELECT
STRIP_TAGS(description, 'a') AS stripped
FROM
tmpcat;
# run update query to replace out all <a> tags
UPDATE tmpcat
SET
description = STRIP_TAGS(description, 'a');
回答by Rembert Oldenboom
Expanded the strip_tags function as provided by Boann. It now can be used to either keep or ignore the phrase between the tags. Note the bug with empty tags in my part of the code, ie. for $keep_phrase = false.
扩展了 Boann 提供的 strip_tags 函数。它现在可用于保留或忽略标签之间的短语。请注意我的代码部分中带有空标签的错误,即。对于 $keep_phrase = false。
CREATE FUNCTION strip_tags($str text, $tag text,$keep_phrase bool) RETURNS text
BEGIN
DECLARE $start, $end INT DEFAULT 1;
SET $str = COALESCE($str, '');
LOOP
SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
IF (!$start) THEN RETURN $str; END IF;
IF ($keep_phrase) THEN
SET $end = LOCATE('>', $str, $start);
IF (!$end) THEN SET $end = $start; END IF;
SET $str = INSERT($str, $start, $end - $start + 1, '');
SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
ELSE
SET $end = LOCATE(CONCAT('</', $tag, '>'),$str,$start);
IF (!$end) THEN
SET $end = LOCATE('/>',$str,$start);
SET $str = INSERT($str, $start, $end - $start + 2, '');
ELSE
SET $str = INSERT($str, $start, $end - $start
+ LENGTH(CONCAT('</', $tag, '>')), '');
END IF;
END IF;
END LOOP;
END //
To prove the functionality:
为了证明功能:
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',true);
<p>so very cool</p><p>so very cool</p>
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><p>so<span id="y"> very</span> cool</p>','span',false);
<p>so cool</p><p>so cool</p>
Empty elements with $keep_phrase = false is not supported, see:
不支持 $keep_phrase = false 的空元素,请参阅:
SELECT strip_tags('<p>so<span id="x"> very</span> cool</p><span/><p>so<span id="y"> very</span> cool</p>','span',false);
<p>so cool</p> cool</p>
回答by Rembert Oldenboom
Regular expression matching is needed here, with negative look-ahead assertion: "mytext" not followed by a closing tag. MySQL still, as of 8.0, doesn't support look assertions, but MariaDB does. MariaDB query for the question:
这里需要正则表达式匹配,带有否定的前瞻断言:“mytext”后面没有结束标记。从 8.0 开始,MySQL 仍然不支持外观断言,但 MariaDB 支持。MariaDB 查询问题:
SELECT * FROM table WHERE column_name REGEXP 'mytext(?![^<>]*>)';
Another solution for the question is to remove some / all tags before matching. It's inefficient, compared with REGEXP, but also works. In MySQL starting with 8.0 and MariaDB starting with 10.0.5, there is built-in REGEXP_REPLACE
function. 'strip_html' is even the first example in MariaDB corresponding documentation page. MySQL / MariaDB query for such approach:
该问题的另一个解决方案是在匹配之前删除一些/所有标签。与 REGEXP 相比,它效率低下,但也有效。MySQL 从 8.0 开始,MariaDB 从 10.0.5 开始,有内置REGEXP_REPLACE
函数。'strip_html' 甚至是 MariaDB 相应文档页面中的第一个示例。MySQL / MariaDB 查询这种方法:
SELECT * FROM table WHERE REGEXP_REPLACE (column_name, '<.+?>', '') LIKE '%mytext%';
Apart from that, strings in the question mix data and representation. They shouldn't be regularly searched as it is a waste of system resources.
除此之外,问题中的字符串混合了数据和表示。不应定期搜索它们,因为这会浪费系统资源。
回答by Egor Pavlikhin
You cannot parse HTML inside of SQL queries, that doesn't make any sense. Perhaps you could just maintain a special search version of the table with all the HTML stripped, but you would have to use some external processing to do that.
您不能在 SQL 查询中解析 HTML,这没有任何意义。也许您可以只保留一个删除所有 HTML 的表格的特殊搜索版本,但是您必须使用一些外部处理来做到这一点。