通过正则表达式从 Oracle 中的字符串中删除简单的 HTML 标签,需要解释

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30756921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:53:59  来源:igfitidea点击:

Remove simple HTML-Tags from String in Oracle via RegExp, Explanation needed

regexoracleplsql

提问by Basti

I do not understand, why my columns reg1 and reg2 remove "bbb" from my string, and only reg3 works as expected.

我不明白,为什么我的 reg1 和 reg2 列从我的字符串中删除了“bbb”,而只有 reg3 按预期工作。

WITH t AS (SELECT 'aaa <b>bbb</b> ccc' AS teststring FROM dual)

SELECT
  teststring,
  regexp_replace(teststring, '<.+>') AS reg1,
  regexp_replace(teststring, '<.*>') AS reg2,
  regexp_replace(teststring, '<.*?>') AS reg3
FROM t


TESTSTRING             REG1        REG2          REG3
aaa <b>bbb</b> ccc     aaa ccc     aaa ccc       aaa bbb ccc

Thanks a lot!

非常感谢!

回答by Olivier Jacot-Descombes

Because regex is greedy by default. I.e. the expressions .*or .+try to take as many characters as possible. Therefore <.+>will span from the first <to the last >. Make it lazy by using the lazy operator ?:

因为默认情况下正则表达式是贪婪的。即表达式.*.+尝试采用尽可能多的字符。因此<.+>将从第一个跨越<到最后一个>。使用惰性运算符使其惰性?

regexp_replace(teststring, '<.+?>')

or

或者

regexp_replace(teststring, '<.*?>')

Now, the search for >will stop at the first >encountered.

现在,搜索>将在>遇到的第一个停止。

Note that .includes >as well, therefore the greedy variant (without ?) swallows all the >but the last.

请注意,.包括>也包括在内,因此贪婪的变体(没有?)吞下>除了最后一个之外的所有。

回答by DevilPinky

Because the first one and the second one are finding this match: <b>bbb</b>- in this case b>bbb</bmatches both .*and .+

因为第一个和第二个正在找到这个匹配: <b>bbb</b>- 在这种情况下b>bbb</b匹配.*.+

The third one also won't do what you need. You are looking for something like this: <[^>]*>. But you also need to replace all matches with ""

第三个也不会做你需要的。你正在寻找这样的东西:<[^>]*>。但是您还需要将所有匹配项替换为“”

回答by A_Developer_in_Austin_TX

If you are merely trying to display the string without all the HTML tags, you can use the function: utl_i18n.unescape_reference(column_name)

如果你只是想显示没有所有 HTML 标签的字符串,你可以使用函数:utl_i18n.unescape_reference(column_name)