正则表达式优化 - 在 Java 中转义符号

Question

提问by Duveit

I need to replace all & in a String that isnt part of a HTML entity. So that the String "This & entites >& <" will return "This &entites > & <"

我需要替换不属于 HTML 实体的字符串中的所有 &。这样字符串“This & entites >& <”将返回“This &entites > & <”

And I've come up with this regex-pattern: "&[a-zA-Z0-9]{2,7};" which works fine. But I'm not very skilled in regex, and when I test the speed over 100k iterations, it uses double amount of time over a previous used method, that didnt use regex. (But werent working 100% either).

我想出了这个正则表达式模式：“ &[a-zA-Z0-9]{2,7};”，它工作正常。但是我在正则表达式方面不是很熟练，当我测试超过 100k 次迭代的速度时，它使用的时间比以前使用的方法多两倍，没有使用正则表达式。（但也没有 100% 工作）。

Testcode:

测试代码：

long time = System.currentTimeMillis();
String reg = "&(?!&#?[a-zA-Z0-9]{2,7};)";
String s="a regex test 1 & 2  1&2 and &_gt; - &_lt;"
for (int i = 0; i < 100000; i++) {test=s.replaceAll(reg, "&amp;");}
System.out.println("Finished in:" + (System.currentTimeMillis() - time) + " milliseconds");

So the question would be whether there is some obvious ways of optimize this regex expression for it to be more effective?

所以问题是是否有一些明显的方法可以优化这个正则表达式以使其更有效？

Answer 1

回答by Chris Thornhill

s.replaceAll(reg, "&")is compiling the regular expression every time. Compiling the pattern once will provide some increase in performance (~30% in this case).

s.replaceAll(reg, "&")每次都在编译正则表达式。编译一次模式将提供一些性能提升（在这种情况下约为 30%）。

long time = System.currentTimeMillis();
String reg = "&(?!&#?[a-zA-Z0-9]{2,7};)";
Pattern p = Pattern.compile(reg);
String s="a regex test 1 & 2  1&2 and &_gt; - &_lt;";
for (int i = 0; i < 100000; i++) {
    String test = p.matcher(s).replaceAll("&amp;");
}
System.out.println("Finished in:" + 
             (System.currentTimeMillis() - time) + " milliseconds");

Answer 2

回答by Gumbo

You have to exclude the &from your look-ahead assertion. So try this regular expression:

您必须&从超前断言中排除。所以试试这个正则表达式：

&(?!#?[a-zA-Z0-9]{2,7};)

Or to be more precise:

或者更准确地说：

&(?!(?:#(?:[xX][0-9a-fA-F]|[0-9]+)|[a-zA-Z]+);)

Answer 3

回答by Valentin Rocher

Another way of doing this wihtout blowing your head with regexp would be to use StringEscapeUtilsfrom Commons Lang.

这样做wihtout用正则表达式吹你的头的另一种方法是使用StringEscapeUtils从下议院郎。

Answer 4

回答by John Weldon

I'm not very familiar with the Java regex classes, but in general you may want to investigate a zero width lookahead for ; after the ampersand.

我对 Java regex 类不是很熟悉，但一般来说，您可能希望调查 ; 的零宽度前瞻。在＆符号之后。

Here is a linkdescribing positive and negative lookaheads

这是一个描述正面和负面预测的链接

正则表达式优化 - 在 Java 中转义符号

提问by Duveit

回答by Chris Thornhill

回答by Gumbo

回答by Valentin Rocher

回答by John Weldon

相关推荐

最近更新

标签

正则表达式优化 - 在 Java 中转义符号

提问by Duveit

回答by Chris Thornhill

回答by Gumbo

回答by Valentin Rocher

回答by John Weldon

相关推荐

java NetBeans 中的文本字段禁用

java 为什么 Spring 的 @Configurable 有时有效，有时无效？

java 在 ColdFusion 中，有没有办法确定代码在哪个服务器上运行？

java 在 JTree 中隐藏/过滤节点？

相关推荐

最近更新

标签