C# 使用静态 Regex.IsMatch 与创建 Regex 实例

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/414328/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 02:29:58  来源:igfitidea点击:

using static Regex.IsMatch vs creating an instance of Regex

c#regexoptimization

提问by Ben McNiel

In C# should you have code like:

在 C# 中,你应该有这样的代码:

public static string importantRegex = "magic!";

public void F1(){
  //code
  if(Regex.IsMatch(importantRegex)){
    //codez in here.
  }
  //more code
}
public void main(){
  F1();
/*
  some stuff happens......
*/
  F1();
}

or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.

还是应该保留一个包含重要模式的 Regex 实例?使用 Regex.IsMatch 的成本是多少?我想在每个 Regex 实例中都创建了一个 NFA。据我所知,这个 NFA 的创作是非常重要的。

采纳答案by P Daddy

In a rare departure from my typical egotism, I'm kind of reversing myself on this answer.

与我典型的自负不同,我在这个答案上有点逆转。

My original answer, preserved below, was based on an examination of version 1.1of the .NET framework. This is pretty shameful, since .NET 2.0 had been out for over three years at the time of my answer, and it contained changes to the Regexclass that significantly affect the difference between the static and instance methods.

我的原始答案(保留在下面)基于对.NET 框架1.1版的检查。这是非常可耻的,因为在我回答时 .NET 2.0 已经推出三年多了,它包含对Regex类的更改,这些更改显着影响了静态方法和实例方法之间的差异。

In .NET 2.0 (and 4.0), the static IsMatchfunction is defined as follows:

在 .NET 2.0(和 4.0)中,静态IsMatch函数定义如下:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}

The significant difference here is that little trueas the third argument. That corresponds to a parameter named "useCache". When that is true, then the parsed tree is retrieved from cached on the second and subsequent use.

这里的显着差异true与第三个参数一样小。这对应于名为“useCache”的参数。如果为真,则在第二次和后续使用时从缓存中检索已解析的树。

This caching eats up most—but not all—of the performance difference between the static and instance methods. In my tests, the static IsMatchmethod was still about 20% slower than the instance method, but that only amounted to about a half second increase when run 100 times over a set of 10,000 input strings (for a total of 1 million operations).

这种缓存消耗了静态方法和实例方法之间的大部分(但不是全部)性能差异。在我的测试中,静态IsMatch方法仍然比实例方法慢大约 20%,但是当在一组 10,000 个输入字符串上运行 100 次(总共 100 万次操作)时,这仅增加了大约半秒。

This 20% slowdown can still be significant in some scenarios. If you find yourself regexing hundreds of millions of strings, you'll probably want to take every step you can to make it more efficient. But I'd bet that 99% of the time, you're using a particular Regex no more than a handful of times, and the extra millisecond you lose to the static method won't be even close to noticeable.

在某些情况下,这 20% 的放缓仍然很重要。如果您发现自己正在对数亿个字符串进行正则表达式,您可能会想尽一切可能提高效率。但我敢打赌,在 99% 的情况下,您使用特定 Regex 的次数不会超过几次,并且您因静态方法而损失的额外毫秒甚至不会被注意到。

Props to devgeezer, who pointed this out almost a year ago, although no one seemed to notice.

devgeezer 的支持,他在差不多一年前就指出了这一点,尽管似乎没有人注意到。

My old answer follows:

我的旧答案如下:



The static IsMatchfunction is defined as follows:

静态IsMatch函数定义如下:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern).IsMatch(input);
}

And, yes, initialization of a Regexobject is not trivial. You should use the static IsMatch(or any of the other static Regexfunctions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regexobject, too.

而且,是的,Regex对象的初始化并非微不足道。您应该使用静态IsMatch(或任何其他静态Regex函数)作为仅用于您将只使用一次的模式的快捷方式。如果您要重用该模式,那么重用一个Regex对象也是值得的。

As to whether or not you should specify RegexOptions.Compiled, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.

至于您是否应该指定RegexOptions.Compiled,正如 Jon Skeet 所建议的那样,那是另一回事。答案是:这取决于。对于简单的模式或仅使用几次的模式,使用非编译实例可能会更快。在决定之前,您绝对应该进行分析。编译一个正则表达式对象的成本确实很大,可能不值得。



Take, as an example, the following:

以以下为例:

const int count = 10000;

string pattern = "^[a-z]+[0-9]+$";
string input   = "abc123";

Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < count; i++)
    Regex.IsMatch(input, pattern);
Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
Regex rx = new Regex(pattern);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
rx = new Regex(pattern, RegexOptions.Compiled);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds);

At count = 10000, as listed, the second output is fastest. Increase countto 100000, and the compiled version wins.

count = 10000,如所列,第二个输出最快。提高count100000和编译的版本获胜。

回答by Jon Skeet

If you're going to reuse the regular expression multiple times, I'd create it with RegexOptions.Compiledand cache it. There's no point in making the framework parse the regex pattern every time you want it.

如果您要多次重用正则表达式,我会创建它RegexOptions.Compiled并缓存它。每次需要时让框架解析正则表达式模式是没有意义的。

回答by Andrew Hare

I agree with Jon and just to clarify it would look something like this:

我同意乔恩,只是为了澄清它看起来像这样:

static Regex regex = new Regex("regex", RegexOptions.Compiled);

Its also worthwile to look at the RegexOptionsenum for other flags that can be helpful at times.

查看RegexOptions枚举以获取有时可能有用的其他标志也是值得的。

回答by Recep

I suggest you read Jeff's poston compiling Regex.

我建议你阅读杰夫关于编译正则表达式的帖子

As for the question, if you are asking this question it means that you are going to use it just once. So, it really does not matter as the Reflector's disassembly of Regex.IsMatch is:

至于问题,如果你问这个问题,就意味着你只会使用它一次。所以,这并不重要,因为反射器对 Regex.IsMatch 的反汇编是:

public static bool IsMatch(string input, string pattern, RegexOptions options)
{
    return new Regex(pattern, options, true).IsMatch(input);
}

回答by benPearce

For an WinForm application I was working on we could define a regex on valid characters which would run on every keystroke and a validation for the text for any textboxes (data entry application), so I used a cache or compiled regexes such as

对于我正在处理的 WinForm 应用程序,我们可以定义一个有效字符的正则表达式,它会在每次击键时运行,并对任何文本框(数据输入应用程序)的文本进行验证,所以我使用了缓存或编译的正则表达式,例如

  private static Dictionary<string, Regex> regexCache = new Dictionary<string, Regex>(20);

Where the regex expression was the key.

正则表达式是关键所在。

Then I had a static function I could call when validating data:

然后我有了一个可以在验证数据时调用的静态函数:

public static bool RegExValidate(string text, string regex)
{
  if (!regexCache.ContainsKey(regex))
  {
    Regex compiledRegex = new Regex(regex,RegexOptions.Compiled);
    regexCache.Add(regex, compiledRegex);
  }
  return regexCache[regex].IsMatch(text);
}

回答by Ben Lings

There are many things that will affect the performance of using a regular expression. Ultimately, the only way to find out the most performant in your situation is to measure, using as realistic a situation as possible.

有很多事情会影响使用正则表达式的性能。归根结底,找出最适合您的情况的唯一方法是使用尽可能真实的情况进行测量。

The page on compilation and reuseof regular expression objects on MSDN covers this. In summary, it says

在该网页汇编和重用MSDN上的正则表达式的对象涵盖了这一点。总之,它说

  1. Compiled regular expressions take time to compile, and once compiled will only have their memory released on AppDomainunloads. Whether you should use compilation or not will depend on the number of patterns you are using and how often they are used.

  2. Static Regexmethods cache the parsed regular expression representation for the last 15 (by default) patterns. So if you aren't using many different patterns in your application, or your usage is sufficiently clustered, there won't be much difference between youcaching the instance or the framework caching it.

  1. 编译后的正则表达式需要时间来编译,并且一旦编译,它们的内存只会在AppDomain卸载时释放。您是否应该使用编译取决于您使用的模式数量和使用频率。

  2. 静态Regex方法缓存最近 15 个(默认情况下)模式的解析正则表达式表示。因此,如果您没有在应用程序中使用许多不同的模式,或者您的使用已经足够集群,那么缓存实例或缓存它的框架之间不会有太大区别。

回答by devgeezer

This answer is no longer correct in regard to versions of .NET that I have on my machine. 4.0.30319 & 2.0.50727 both have the following for IsMatch:

对于我机器上的 .NET 版本,此答案不再正确。4.0.30319 和 2.0.50727 都具有以下 IsMatch 功能:

public static bool IsMatch(string input, string pattern)
{
  return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}

The 'true' value is for a constructor parameter called "useCache". All of the Regex constructors ultimately chain through this one, the statics call this one directly - passing in 'true'.

'true' 值用于名为“useCache”的构造函数参数。所有正则表达式构造函数最终都通过这个链接,静态直接调用这个 - 传入“true”。

You read more on the BCL blog post about optimizing Regex performance highlighting the static methods' cache use here. This blog posts also cites performance measurements. Reading series of blog posts on optimizing Regex performance is a great place to start.

您在 BCL 博客文章中阅读了有关优化 Regex 性能的更多信息,此处重点介绍了静态方法的缓存使用。这篇博文还引用了性能测量。阅读有关优化 Regex 性能的系列博客文章是一个很好的起点。