C# 何时不使用 RegexOptions.Compiled

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9969158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 11:26:00  来源:igfitidea点击:

When not to use RegexOptions.Compiled

c#regex

提问by inutan

I understand the advantage of using RegexOptions.Compiled - it improves upon the execution time of app by having the regular expression in compiled form instead of interpreting it at run-time. Although using this is not recommended for application which are already slow at start-up time.

我理解使用 RegexOptions.Compiled 的优势 - 它通过以编译形式使用正则表达式而不是在运行时解释它来改善应用程序的执行时间。虽然不建议在启动时已经很慢的应用程序中使用它。

But if my application can bear any slight increase in start-up time -
what are the other scenarios in which I should NOT use RegexOptions.Compiled?

但是如果我的应用程序可以承受启动时间的任何轻微增加 -
我不应该使用 RegexOptions.Compiled 的其他场景是什么?

Just as a note I am calling this method several times -

就像注释一样,我多次调用此方法-

private static string GetName(string objString)
{
    return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
}

So, this method is called with different values for 'objString' (although values for objString may repeat as well).

因此,使用不同的 'objString' 值调用此方法(尽管 objString 的值也可能重复)。

Do you think it's good/not good to use RegexOptions.Compiled here? Any web link would be really helpful.
Thank you!

你认为在这里使用 RegexOptions.Compiled 好/不好?任何网络链接都会非常有帮助。
谢谢!



EDIT

编辑

I tested my web app with both

我用两者测试了我的网络应用程序

  • RegexOptions.Compiled, and
  • Instantiate Regexas class variable
  • RegexOptions.Compiled, 和
  • 实例Regex化为类变量

But couldn't find any big difference in time taken by my web application - Only thing I noticed in both scenarios is that first time when the application loads it's taking double of the time taken compared to that in successive page loads and that is irrespective of whether I use RegexOptions.Compiled or not.

但是在我的 Web 应用程序所用的时间上找不到任何大的差异 - 我在这两种情况下唯一注意到的是,第一次加载应用程序时,它所用的时间是连续页面加载所用时间的两倍,这与我是否使用 RegexOptions.Compiled 。

Any comments for --
why my web application takes longer for the Regex to process for first time and time taken is reduced to almost half or less in subsequent loads - Is there any inbuilt caching or some other .net feature is helping here. P.S. This thing is same if I use RegexOptions.Compiled or not.

任何评论 -
为什么我的 Web 应用程序第一次处理 Regex 需要更长的时间,并且在后续加载中花费的时间减少到几乎一半或更少 - 是否有任何内置缓存或其他一些 .net 功能在这里有帮助。PS 如果我使用 RegexOptions.Compiled 与否,这件事是一样的。

采纳答案by ruakh

For any specific performance question like this, the best way to find out which way is faster is to test both and see.

对于像这样的任何特定性能问题,找出哪种方法更快的最佳方法是同时测试并查看。

In general, compiling a regex is unlikely to have much benefit unless you're using the regex a lot, or on verylarge strings. (Or both.) I think it's more of an optimization to try after you've determined that you have a performance problem and you think this might help, than one to try randomly.

一般来说,编译一个正则表达式是不可能有多大的好处,除非你使用正则表达式一个不少,还是非常大的字符串。(或两者兼而有之。)我认为在您确定存在性能问题并且您认为这可能会有所帮助之后尝试优化,而不是随机尝试。

For some general discussion on the drawbacks of RegexOptions.Compiled, see this blog post by Jeff Atwood; it's very old, but from what I understand, none of the major relevant facts have changed since it was written.

有关 的缺点的一些一般性讨论RegexOptions.Compiled,请参阅Jeff Atwood 的这篇博文;它已经很老了,但据我所知,自从它写成以来,主要的相关事实都没有改变。

回答by Chris Shain

Compilation generally only improves performance if you are saving the Regex object that you create. Since you are not, in your example, saving the Regex, you should not compile it.

如果您要保存您创建的 Regex 对象,编译通常只会提高性能。在您的示例中,由于您没有保存正则表达式,因此您不应编译它。

You might want to restructure the code this way (note I re-wrote the regex to what I thinkyou want. Having the start-of-line carat in a repeating group doesn't make a whole lot of sense, and I assume a name prefix ends with a dash):

您可能想以这种方式重组代码(注意,我将正则表达式重新编写为我认为您想要的内容。在重复组中使用行首克拉并没有多大意义,我假设名称前缀以破折号结尾):

    private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
    private static string GetNameCompiled(string objString)
    {
        return CompiledRegex.Replace(objString, "");
    }

I wrote some test code for this also:

我也为此编写了一些测试代码:

    public static void TestSpeed()
    {
        var testData = "fooooo-bar";
        var timer = new Stopwatch();

        timer.Start();
        for (var i = 0; i < 10000; i++)
            Assert.AreEqual("bar", GetNameCompiled(testData));
        timer.Stop();
        Console.WriteLine("Compiled took " + timer.ElapsedMilliseconds + "ms");
        timer.Reset();

        timer.Start();
        for (var i = 0; i < 10000; i++)
            Assert.AreEqual("bar", GetName(testData));
        timer.Stop();
        Console.WriteLine("Uncompiled took " + timer.ElapsedMilliseconds + "ms");
        timer.Reset();

    }

    private static readonly Regex CompiledRegex = new Regex("^[a-zA-Z]+-", RegexOptions.Compiled);
    private static string GetNameCompiled(string objString)
    {
        return CompiledRegex.Replace(objString, "");
    }

    private static string GetName(string objString)
    {
        return Regex.Replace(objString, "^[a-zA-Z]+-", "");
    }

On my machine, I get:

在我的机器上,我得到:

Compiled took 21ms

Uncompiled took 37ms

编译耗时 21ms

未编译耗时 37 毫秒

回答by Steve Wortham

Two things to think about are that RegexOptions.Compiledtakes up CPU time and memory.

需要考虑的两件事是RegexOptions.Compiled占用 CPU 时间和内存。

With that in mind, there's basically just one instance when you should notuse RegexOptions.Compiled :

考虑到这一点,基本上只有一个实例应该使用 RegexOptions.Compiled :

  • Your regular expression only runs a handful of times and the net speedup at runtime doesn't justify the cost of compilation.
  • 您的正则表达式只运行了几次,运行时的净加速并不能证明编译成本是合理的。

There are too many variables to predict and draw a line in the sand, so to speak. It'd really require testing to determine the optimal approach. Or, if you don't feel like testing, then don't use Compileduntil you do.

可以这么说,有太多的变量需要预测和划清界限。它真的需要测试来确定最佳方法。或者,如果您不想测试,那么在您测试Compiled之前不要使用。

Now, if you do choose RegexOptions.Compiledit's important that you're not wasteful with it.

现在,如果您确实选择了RegexOptions.Compiled,重要的是不要浪费它。

Often the best way to go about it is to define your object as a static variable that can be reused over and over. For example...

通常最好的方法是将您的对象定义为可以反复重用的静态变量。例如...

public static Regex NameRegex = new Regex(@"[^a-zA-Z&-]+", RegexOptions.Compiled);

The one problem with this approach is that if you're declaring this globally, then it may be a waste if your application doesn't always use it, or doesn't use it upon startup. So a slightly different approach would be to use lazy loadingas I describe in the article I wrote yesterday.

这种方法的一个问题是,如果您在全局范围内声明它,那么如果您的应用程序并不总是使用它,或者在启动时不使用它,那么这可能是一种浪费。因此,一种稍微不同的方法是使用我在昨天写的文章中描述的延迟加载

So in this case it'd be something like this...

所以在这种情况下它会是这样的......

public static Lazy<Regex> NameRegex = 
    new Lazy<Regex>(() => new Regex("[^a-zA-Z&-]+", RegexOptions.Compiled));

Then you simply reference NameRegex.Valuewhenever you want to use this regular expression and it's only instantiated when it's first accessed.

然后,NameRegex.Value只要您想使用此正则表达式,就只需引用它,并且仅在第一次访问时才对其进行实例化。



RegexOptions.Compiled in the Real World

RegexOptions.Compiled 在现实世界中

On a couple of my sites, I'm using Regex routes for ASP.NET MVC. And this scenario is a perfect use for RegexOptions.Compiled. The routes are defined when the web application starts up, and are then reused for all subsequent requests as long as the application keeps running. So these regular expressions are instantiated and compiled once and reused millions of times.

在我的几个站点上,我使用了 ASP.NET MVC 的 Regex 路由。而这个场景是RegexOptions.Compiled. 这些路由是在 Web 应用程序启动时定义的,然后只要应用程序继续运行,就会对所有后续请求重复使用。所以这些正则表达式被实例化和编译一次,并重复使用数百万次。

回答by drf

From a BCL blog post, compiling increases the startup time by an order of magnitude, but decreases subsequent runtimes by about 30%. Using these numbers, compilation should be considered for a pattern that you expect to be evaluated more than about 30 times. (Of course, like any performance optimization, both alternatives should be measured for acceptability.)

来自BCL 博客文章,编译将启动时间增加了一个数量级,但将后续运行时间减少了大约 30%。使用这些数字,应该考虑对您期望评估超过 30 次的模式进行编译。(当然,就像任何性能优化一样,应该衡量两种替代方案的可接受性。)

If performance is critical for a simple expression called repeatedly, you may want to avoid using regular expressions altogether. I tried running some variants about 5 million times each:

如果性能对于重复调用的简单表达式至关重要,您可能希望完全避免使用正则表达式。我尝试运行一些变体,每个变体大约 500 万次:

Note:edited from previous version to correct regular expression.

注意:从以前的版本编辑以更正正则表达式。

    static string GetName1(string objString)
    {
        return Regex.Replace(objString, "[^a-zA-Z&-]+", "");
    }

    static string GetName2(string objString)
    {
        return Regex.Replace(objString, "[^a-zA-Z&-]+", "", RegexOptions.Compiled);
    }

    static string GetName3(string objString)
    {
        var sb = new StringBuilder(objString.Length);
        foreach (char c in objString)
            if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '-' || c == '&')
                sb.Append(c);
        return sb.ToString();
    }


    static string GetName4(string objString)
    {
        char[] c = objString.ToCharArray();
        int pos = 0;
        int writ = 0;
        while (pos < c.Length)
        {
            char curr = c[pos];
            if ((curr >= 'A' && curr <= 'Z') || (curr >= 'a' && curr <= 'z') || curr == '-' || curr == '&')
            {
                c[writ++] = c[pos];
            }
            pos++;
        }
        return new string(c, 0, writ);
    }


    unsafe static string GetName5(string objString)
    {
        char* buf = stackalloc char[objString.Length];
        int writ = 0;
        fixed (char* sp = objString)
        {
            char* pos = sp;
            while (*pos != '
   Method 1: 32.3  seconds (interpreted regex)
   Method 2: 24.4  seconds (compiled regex)
   Method 3:  1.82 seconds (StringBuilder concatenation)
   Method 4:  1.64 seconds (char[] manipulation)
   Method 5:  1.54 seconds (unsafe char* manipulation)
') { char curr = *pos; if ((curr >= 'A' && curr <= 'Z') || (curr >= 'a' && curr <= 'z') || curr == '-' || curr == '&') buf[writ++] = curr; pos++; } } return new string(buf, 0, writ); }

Executing independently for 5 million random ASCII strings, 30 characters each, consistently gave these numbers:

独立执行 500 万个随机 ASCII 字符串,每个 30 个字符,始终给出以下数字:

##代码##

That is, compilation provided about a 25% performance benefit for a very large number of evaluations of this pattern, with the first execution being about 3 times slower. Methods that operated on the underlying character arrays were 12 times faster than the compiled regular expressions.

也就是说,编译为这种模式的大量评估提供了大约 25% 的性能优势,第一次执行速度大约慢了 3 倍。对底层字符数组进行操作的方法比编译的正则表达式快 12 倍。

While method 4 or method 5 may provide some performance benefit over regular expressions, the other methods may provide other benefits (maintainability, readability, flexibility, etc.). This simple test does suggest that, in this case, compiling the regex has a modest performance benefit over interpreting it for a large number of evaluations.

虽然方法 4 或方法 5 可能比正则表达式提供一些性能优势,但其他方法可能会提供其他优势(可维护性、可读性、灵活性等)。这个简单的测试确实表明,在这种情况下,编译正则表达式比为大量评估解释它具有适度的性能优势。