通过正则表达式从 Oracle 中的字符串中删除简单的 HTML 标签,需要解释
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30756921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove simple HTML-Tags from String in Oracle via RegExp, Explanation needed
提问by Basti
I do not understand, why my columns reg1 and reg2 remove "bbb" from my string, and only reg3 works as expected.
我不明白,为什么我的 reg1 和 reg2 列从我的字符串中删除了“bbb”,而只有 reg3 按预期工作。
WITH t AS (SELECT 'aaa <b>bbb</b> ccc' AS teststring FROM dual)
SELECT
teststring,
regexp_replace(teststring, '<.+>') AS reg1,
regexp_replace(teststring, '<.*>') AS reg2,
regexp_replace(teststring, '<.*?>') AS reg3
FROM t
TESTSTRING REG1 REG2 REG3
aaa <b>bbb</b> ccc aaa ccc aaa ccc aaa bbb ccc
Thanks a lot!
非常感谢!
回答by Olivier Jacot-Descombes
Because regex is greedy by default. I.e. the expressions .*
or .+
try to take as many characters as possible. Therefore <.+>
will span from the first <
to the last >
. Make it lazy by using the lazy operator ?
:
因为默认情况下正则表达式是贪婪的。即表达式.*
或.+
尝试采用尽可能多的字符。因此<.+>
将从第一个跨越<
到最后一个>
。使用惰性运算符使其惰性?
:
regexp_replace(teststring, '<.+?>')
or
或者
regexp_replace(teststring, '<.*?>')
Now, the search for >
will stop at the first >
encountered.
现在,搜索>
将在>
遇到的第一个停止。
Note that .
includes >
as well, therefore the greedy variant (without ?
) swallows all the >
but the last.
请注意,.
包括>
也包括在内,因此贪婪的变体(没有?
)吞下>
除了最后一个之外的所有。
回答by DevilPinky
Because the first one and the second one are finding this match:
<b>bbb</b>
- in this case b>bbb</b
matches both .*
and .+
因为第一个和第二个正在找到这个匹配:
<b>bbb</b>
- 在这种情况下b>bbb</b
匹配.*
和.+
The third one also won't do what you need. You are looking for something like this: <[^>]*>
. But you also need to replace all matches with ""
第三个也不会做你需要的。你正在寻找这样的东西:<[^>]*>
。但是您还需要将所有匹配项替换为“”
回答by A_Developer_in_Austin_TX
If you are merely trying to display the string without all the HTML tags, you can use the function: utl_i18n.unescape_reference(column_name)
如果你只是想显示没有所有 HTML 标签的字符串,你可以使用函数:utl_i18n.unescape_reference(column_name)