.net Regex.IsMatch 与 string.Contains

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2962670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 14:21:04  来源:igfitidea点击:

Regex.IsMatch vs string.Contains

.netregexstring

提问by Pradeep

Is there any difference in speed/memory usage for these two equivalent expressions:

这两个等效表达式的速度/内存使用量是否有任何差异:

Regex.IsMatch(Message, "1000")

Vs

对比

Message.Contains("1000")

Any situations where one is better than other ?

任何一种情况比另一种更好?

The context of this question is as follows: I was making some changes to legacy code which contained the Regex expression to find whether a string is contained within another string. Being legacy code I did not make any changes to that and in the code review somebody suggested that Regex.IsMatch should be replaced by string.Contains. So I was wondering whether the change was worth making.

这个问题的上下文如下:我正在对包含 Regex 表达式的遗留代码进行一些更改,以查找一个字符串是否包含在另一个字符串中。作为遗留代码,我没有对其进行任何更改,并且在代码中有人建议 Regex.IsMatch 应替换为 string.Contains。所以我想知道这个改变是否值得。

回答by Andrew Hare

For simple cases String.Containswill give you better performance but String.Containswill not allow you to do complex pattern matching. Use String.Containsfor non-pattern matching scenarios (like the one in your example) and use regular expressions for scenarios in which you need to do more complex pattern matching.

对于简单的情况,String.Contains将为您提供更好的性能,但String.Contains不允许您进行复杂的模式匹配。使用String.Contains非模式匹配的情况(像在你的例子),并使用正则表达式中,你需要做更复杂的模式匹配的情况。

A regular expression has a certain amount of overhead associated with it (expression parsing, compilation, execution, etc.) that a simple method like String.Containssimply does not have which is why String.Containswill outperform a regular expression in examples like yours.

正则表达式有一定数量的相关开销(表达式解析、编译、执行等),这是简单方法所String.Contains不具备的,这就是为什么String.Contains在像您这样的示例中会胜过正则表达式的原因。

回答by user279470

String.Containsis slower when you compare it to a compiled regular expression. Considerably slower, even!

String.Contains将其与编译的正则表达式进行比较时速度较慢。甚至慢得多!

You can test it running this benchmark:

您可以在运行此基准测试时对其进行测试:

class Program
{
  public static int FoundString;
  public static int FoundRegex;

  static void DoLoop(bool show)
  {
    const string path = "C:\file.txt";
    const int iterations = 1000000;
    var content = File.ReadAllText(path);

    const string searchString = "this exists in file";
    var searchRegex = new Regex("this exists in file");

    var containsTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (content.Contains(searchString))
      {
        FoundString++;
      }
    }
    containsTimer.Stop();

    var regexTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (searchRegex.IsMatch(content))
      {
        FoundRegex++;
      }
    }
    regexTimer.Stop();

    if (!show) return;

    Console.WriteLine("FoundString: {0}", FoundString);
    Console.WriteLine("FoundRegex: {0}", FoundRegex);
    Console.WriteLine("containsTimer: {0}", containsTimer.ElapsedMilliseconds);
    Console.WriteLine("regexTimer: {0}", regexTimer.ElapsedMilliseconds);

    Console.ReadLine();
  }

  static void Main(string[] args)
  {
    DoLoop(false);
    DoLoop(true);
    return;
  }
}

回答by Martin Liversage

To determine which is the fastest you will have to benchmark your own system. However, regular expressions are complex and chances are that String.Contains()will be the fastest and in your case also the simplest solution.

要确定哪个最快,您必须对自己的系统进行基准测试。但是,正则表达式很复杂,而且很可能是String.Contains()最快的,在您的情况下也是最简单的解决方案。

The implementation of String.Contains()will eventually call the native method IndexOfString()and the implementation of that is only known by Microsoft. However, a good algorithm for implementing this method is using what is known as the Knuth–Morris–Pratt algorithm. The complexity of this algorithm is O(m + n)where mis the length of the string you are searching for and nis the length of the string you are searching making it a very efficient algorithm.

的实现String.Contains()最终会调用本地方法, IndexOfString()而那个的实现只有微软知道。然而,实现这种方法的一个很好的算法是使用所谓的Knuth-Morris-Pratt 算法。该算法的复杂度为O(m + n),其中m是您要搜索的字符串的长度,n是您要搜索的字符串的长度,这使其成为一种非常有效的算法。

Actually, the efficiency of search using regular expression can be as low O(n)depending on the implementation so it may still be competetive in some situations. Only a benchmark will be able to determine this.

实际上,使用正则表达式进行搜索的效率可以低至O(n),具体取决于实现,因此在某些情况下仍然具有竞争力。只有基准才能确定这一点。

If you are really concerned about search speed Christian Charras and Thierry Lecroq has a lot of material about exact string matching algorithmsat Université de Rouen.

如果您真的很关心搜索速度 Christian Charras 和 Thierry Lecroq在鲁昂大学有很多关于精确字符串匹配算法的资料

回答by spring1975

@user279470 I was looking for an efficient way to count words just for fun and came across this. I gave it the OpenOffice Thesaurus dat file to iterate through. Total Word Count came to 1575423.

@user279470 我一直在寻找一种有效的方法来计算单词,只是为了好玩,结果遇到了这个。我给了它 OpenOffice Thesaurus dat 文件以进行迭代。总字数达到 1575423。

Now, my end goal didn't have a use for contains, but what was interesting was seeing the different ways you can call regex that make it even faster. I created some other methods to compare an instance use of regex and a static use with the RegexOptions.compiled.

现在,我的最终目标没有使用 contains,但有趣的是看到可以调用 regex 的不同方式,使其更快。我创建了一些其他方法来比较正则表达式的实例使用和与 RegexOptions.compiled 的静态使用。

public static class WordCount
{
    /// <summary>
    /// Count words with instaniated Regex.
    /// </summary>
    public static int CountWords4(string s)
    {
        Regex r = new Regex(@"[\S]+");
        MatchCollection collection = r.Matches(s);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static compiled Regex.
    /// </summary>
    public static int CountWords1(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+", RegexOptions.Compiled);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static Regex.
    /// </summary>
    public static int CountWords3(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+");
        return collection.Count;
    }

    /// <summary>
    /// Count word with loop and character tests.
    /// </summary>
    public static int CountWords2(string s)
    {
        int c = 0;
        for (int i = 1; i < s.Length; i++)
        {
            if (char.IsWhiteSpace(s[i - 1]) == true)
            {
                if (char.IsLetterOrDigit(s[i]) == true ||
                    char.IsPunctuation(s[i]))
                {
                    c++;
                }
            }
        }
        if (s.Length > 2)
        {
            c++;
        }
        return c;
    }
}
  • regExCompileTimer.ElapsedMilliseconds 11787
  • regExStaticTimer.ElapsedMilliseconds 12300
  • regExInstanceTimer.ElapsedMilliseconds 13925
  • ContainsTimer.ElapsedMilliseconds 1074
  • regExCompileTimer.ElapsedMilliseconds 11787
  • regExStaticTimer.ElapsedMilliseconds 12300
  • regExInstanceTimer.ElapsedMilliseconds 13925
  • 包含Timer.ElapsedMilliseconds 1074

回答by gb2d

My own bench marks appear to contradict user279470's benchmark results.

我自己的基准测试似乎与 user279470 的基准测试结果相矛盾。

In my use case I wanted to check a simple Regex with some OR operators for 4 values versus doing 4 x String.Contains().

在我的用例中,我想用一些 OR 运算符检查一个简单的 Regex 是否有 4 个值,而不是 4 x String.Contains()

Even with 4 x String.Contains(), I found that String.Contains()was 5 x faster.

即使使用 4 x String.Contains(),我发现它String.Contains()也快了 5 倍。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Text.RegularExpressions;

namespace App.Tests.Performance
{
    [TestClass]
    public class PerformanceTesting
    {
        private static Random random = new Random();

        [TestMethod]
        public void RegexVsMultipleContains()
        {
            var matchRegex = new Regex("INFO|WARN|ERROR|FATAL");

            var testStrings = new List<string>();

            int iterator = 1000000 / 4; // div 4 for each of log levels checked

            for (int i = 0; i < iterator; i++)
            {
                for (int j = 0; j < 4; j++)
                {
                    var simulatedTestString = RandomString(50);

                    if (j == 0)
                    {
                        simulatedTestString += "INFO";
                    }
                    else if (j == 1)
                    {
                        simulatedTestString += "WARN";
                    }
                    else if (j == 2)
                    {
                        simulatedTestString += "ERROR";
                    }
                    else if (j == 3)
                    {
                        simulatedTestString += "FATAL";
                    }

                    simulatedTestString += RandomString(50);

                    testStrings.Add(simulatedTestString);
                }
            }

            int cnt;
            Stopwatch sw;

            //////////////////////////////////////////
            // Multiple contains test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = testStrings[i].Contains("INFO") || testStrings[i].Contains("WARN") || testStrings[i].Contains("ERROR") || testStrings[i].Contains("FATAL");

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Multiple contains using list test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            var searchStringList = new List<string> { "INFO", "WARN", "ERROR", "FATAL" };

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = searchStringList.Any(x => testStrings[i].Contains(x));

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS USING LIST: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Regex test
            ////////////////////////////////////////// 

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = matchRegex.IsMatch(testStrings[i]);

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("REGEX: " + cnt + " " + sw.ElapsedMilliseconds);
        }

        public static string RandomString(int length)
        {
            const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

            return new string(Enumerable.Repeat(chars, length).Select(s => s[random.Next(s.Length)]).ToArray());
        }
    }
}

回答by Matthew Flaschen

Yes, for this task, string.Contains will almost certainly be faster and use less memory. And in of course, there's no reason to use regex here.

是的,对于这个任务, string.Contains 几乎肯定会更快并且使用更少的内存。当然,这里没有理由使用正则表达式。