c#比较字符串的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19436440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 14:58:58  来源:igfitidea点击:

c# Fastest way to compare strings

c#stringperformance

提问by CoolCodeBro

I've noticed that

我注意到了

string1.Length == string2.Length && string1 == string2

on large strings is slightly faster than just

在大字符串上比仅仅快一点

string1 == string2

Is this true? And is this a good practice to compare large string lengths before comparing actual strings?

这是真的?这是在比较实际字符串之前比较大字符串长度的好习惯吗?

采纳答案by usr

strings operator equals does the length check before comparing the chars. So you do not save the comparison of the contents with this trick. You mightstill save a few CPU cycles because your length check assumes that the strings are not null, while the BCL must check that. So if the lengths are not equal most of the time, you will short-circuit a few instructions.

strings 运算符等于在比较字符之前进行长度检查。所以你不要用这个技巧保存内容的比较。您可能仍会节省一些 CPU 周期,因为您的长度检查假定字符串不为空,而 BCL 必须检查这一点。因此,如果大部分时间长度不相等,您将短路一些指令。

I might just be wrong here, though. Maybe the operator gets inlined and the checks optimized out. Who knows for sure? (He who measures.)

不过,我可能只是在这里错了。也许运算符被内联并且检查被优化了。谁知道呢?(测量的人。)

If you care about saving every cycle you can you should probably use a different strategy in the first place. Maybe managed code is not even the right choice. Given that, I recommend to use the shorter form and not use the additional check.

如果您关心保存每个周期,您可能应该首先使用不同的策略。也许托管代码甚至不是正确的选择。鉴于此,我建议使用较短的形式,而不是使用附加检查。

回答by Habib

String.Equality Operatoror ==internally calls string.Equals, so use string.Equalsor ==provided by the framework. It is already optimized enough.

String.Equality 运算符==内部调用string.Equals,因此使用string.Equals==由框架提供。它已经足够优化了。

It first compare references, then length and then actual characters.

它首先比较引用,然后是长度,然后是实际字符。

You can find the source code here

你可以在这里找到源代码

Code: (Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs)

代码:(来源:http: //www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376 /String@cs)

// Determines whether two strings match.
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override bool Equals(Object obj) {
    if (this == null)                        //this is necessary to guard against reverse-pinvokes and
        throw new NullReferenceException();  //other callers who do not use the callvirt instruction

    String str = obj as String;
    if (str == null)
        return false;

    if (Object.ReferenceEquals(this, obj))
        return true;

    return EqualsHelper(this, str);
}

and

[System.Security.SecuritySafeCritical]  // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
    Contract.Requires(strA != null);
    Contract.Requires(strB != null);
    int length = strA.Length;
    if (length != strB.Length) return false;

    fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
    {
        char* a = ap;
        char* b = bp;

        // unroll the loop
#if AMD64
        // for AMD64 bit platform we unroll by 12 and
        // check 3 qword at a time. This is less code
        // than the 32 bit case and is shorter
        // pathlength

        while (length >= 12)
        {
            if (*(long*)a     != *(long*)b) break;
            if (*(long*)(a+4) != *(long*)(b+4)) break;
            if (*(long*)(a+8) != *(long*)(b+8)) break;
            a += 12; b += 12; length -= 12;
        }
 #else
        while (length >= 10)
        {
            if (*(int*)a != *(int*)b) break;
            if (*(int*)(a+2) != *(int*)(b+2)) break;
            if (*(int*)(a+4) != *(int*)(b+4)) break;
            if (*(int*)(a+6) != *(int*)(b+6)) break;
            if (*(int*)(a+8) != *(int*)(b+8)) break;
            a += 10; b += 10; length -= 10;
        }
  #endif

        // This depends on the fact that the String objects are
        // always zero terminated and that the terminating zero is not included
        // in the length. For odd string sizes, the last compare will include
        // the zero terminator.
        while (length > 0)
        {
            if (*(int*)a != *(int*)b) break;
            a += 2; b += 2; length -= 2;
        }

        return (length <= 0);
    }
}

回答by JLe

I'd say the first one is faster is the result of string1.Length == string2.Lengthis false. Thanks to Short Circuit Evalution (SCE) the actual comparision between the strings is then not made, which might save you time.

我会说第一个更快是结果string1.Length == string2.Length是假的。由于短路评估 (SCE),因此不会对字符串进行实际比较,这可能会节省您的时间。

If the strings are equal however, the first one is slower since it will check the length first and then do the same thing as the second one.

但是,如果字符串相等,则第一个字符串会较慢,因为它会先检查长度,然后执行与第二个字符串相同的操作。

See http://msdn.microsoft.com/en-us/library/2a723cdk.aspxfor information about the &&operator and SCE.

有关运营商和 SCE 的信息,请参阅http://msdn.microsoft.com/en-us/library/2a723cdk.aspx&&

回答by Ben Voigt

In terminated strings, it makes sense to just start comparing characters, since you can't calculate the string lengths without iterating all characters anyway, and the comparison is likely to early exit.

在终止的字符串中,开始比较字符是有意义的,因为无论如何都无法在不迭代所有字符的情况下计算字符串长度,并且比较可能会提前退出。

With length-counted strings, comparing the length should be done first, if you are testing for byte-wise equality. You can't even start accessing character data without retrieving the length, since one could be zero-length.

对于按长度计算的字符串,如果您要测试按字节相等,则应首先比较长度。你甚至不能在不检索长度的情况下开始访问字符数据,因为一个长度可能为零。

If you are doing a relational comparison, knowing the lengths are different doesn't tell you if the result should be positive or negative. And in a culture-aware comparison, equal strings do not imply equal lengths. So for both of those you need to just compare data.

如果您正在进行关系比较,知道长度不同并不能告诉您结果应该是正数还是负数。在文化感知比较中,相等的字符串并不意味着相等的长度。因此,对于这两者,您只需要比较数据即可。

If operator==(string, string)simply delegates to a relational comparison, you wouldn't expect that to compare lengths. Checking length before doing the comparison could therefore be a benefit. But it seems like the Framework does start with a length check.

如果operator==(string, string)只是委托给关系比较,您不会期望比较长度。因此,在进行比较之前检查长度可能是一个好处。但似乎框架确实从长度检查开始。

回答by p.s.w.g

According ILSpy, the string ==operator is defined as:

根据 ILSpy,字符串==运算符定义为:

public static bool operator ==(string a, string b)
{
    return string.Equals(a, b);
}

Which is defined as

定义为

public static bool Equals(string a, string b)
{
    return a == b || (a != null && b != null && a.Length == b.Length && string.EqualsHelper(a, b));
}

I assume that first a == bis actually a reference equality check (ILSpy is just rendering it as ==), otherwise this would be an infinitely recursive method.

我假设 firsta == b实际上是一个引用相等性检查(ILSpy 只是将它呈现为==),否则这将是一个无限递归方法。

This means that ==already checks the lengths of the strings before actually comparing their characters.

这意味着==在实际比较它们的字符之前已经检查了字符串的长度。

回答by Misha Zaslavsky

So as I promised I wrote a short code with a stopwatch - you can copy paste it and try on different strings and see the differences

所以正如我所承诺的,我用秒表写了一个简短的代码 - 你可以复制粘贴它并尝试不同的字符串并查看差异

class Program
{
    static void Main(string[] args)
    {
        string str1 = "put the first value";
        string str2 = "put the second value";
        CompareTwoStringsWithStopWatch(str1, str2); //Print the results.
    }

    private static void CompareTwoStringsWithStopWatch(string str1, string str2)
    {
        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1.Length == str2.Length && str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1.Length == str2.Length && str1 == str2", stopwatch.Elapsed);
        stopwatch.Reset();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1 == str2", stopwatch.Elapsed);
    }

    private static int SomeOperation()
    {
        var value = 500;
        value += 5;

        return value - 300;
    }
}

My conclusions:

我的结论:

  1. As I checked some strings (short ones and long ones) I saw that all the results are almost the same. So the first if (with the length check) is slower in 2/3.
  2. And you have an Equals method in the Object class, just use it :)
  3. You can try it and give us the results also :)
  1. 当我检查一些字符串(短字符串和长字符串)时,我发现所有结果几乎相同。所以第一个 if(带长度检查)慢了 2/3。
  2. 您在 Object 类中有一个 Equals 方法,只需使用它:)
  3. 您可以尝试一下,也可以给我们结果:)

回答by Ricardo Pieper

If you expect the strings to be different in their lenghts in most of the time, you can compare their lenghts ANDthen compare the strings itself by using string.Compare. I got almost 50% performance improvement by doing this:

如果您希望在大多数情况下字符串的长度不同,您可以比较它们的长度然后使用string.Compare. 通过这样做,我获得了近 50% 的性能提升:

if (str1.Length == str2.Length)
{
    if (string.Compare(str1, str2, StringComparison.Ordinal) == 0)
    {
       doSomething()
    }
}

In this case, I expect the strings to be different almost all the time, I think str1.Lenght is way cheaper than comparing the actual strings. If they are equal in size, I compare them.

在这种情况下,我希望字符串几乎一直都不同,我认为 str1.Lenght 比比较实际字符串便宜得多。如果它们的大小相等,我会比较它们。

EDIT: Forget what I said. Just use ==and be happy.

编辑:忘记我说的话。只需使用==并快乐。

回答by user2888973

My test results

我的测试结果

Compare 10000 strings to 10000 other strings all the same length (256)

将 10000 个字符串与 10000 个其他长度相同的字符串进行比较 (256)

Time (s1 == s2): 32536889 ticks

时间 (s1 == s2):32536889 滴答

Time (s1.Length == s2.Length) && (s1 == s2): 37380529 ticks

时间 (s1.Length == s2.Length) && (s1 == s2): 37380529 滴答

Compare 10000 strings to 10000 other strings Random length max 256

将 10000 个字符串与 10000 个其他字符串进行比较 随机长度最大 256

Time (s1 == s2): 27223517 ticks

时间 (s1 == s2):27223517 滴答

Time (s1.Length == s2.Length) && (s1 == s2): 23419529 ticks

时间 (s1.Length == s2.Length) && (s1 == s2): 23419529 滴答

Compare 10000 strings to 10000 other strings Random length min 256 max 512

将 10000 个字符串与 10000 个其他字符串进行比较随机长度最小 256 最大 512

Time (s1 == s2): 28904898 ticks

时间 (s1 == s2):28904898 滴答

Time (s1.Length == s2.Length) && (s1 == s2): 25442710 ticks

时间 (s1.Length == s2.Length) && (s1 == s2):25442710 个滴答

What I find mind boggling is that a compare of 10000 equal length strings will take longer than comparing the same amount of data that is larger.

我发现令人难以置信的是,比较 10000 个等长字符串比比较更大的相同数量的数据需要更长的时间。

All these test have been done with exactly the same data.

所有这些测试都是使用完全相同的数据完成的。

回答by Free Coder 24

For the geeks among us, here's a pagewhich does a great job at benchmarking numerous ways to compare strings.

对于我们中的极客,这里有一个页面,它在对多种比较字符串的方法进行基准测试方面做得很好。

In a nutshell, the fastest method appears to be the CompareOrdinal:

简而言之,最快的方法似乎是 CompareOrdinal:

if (string.CompareOrdinal(stringsWeWantToSeeIfMatches[x], stringsWeAreComparingAgainst[x]) == 0)
{
//they're equal
}

The second best way seems to be using either a Dictionary or Hashset with the "key" as the string you want to compare.

第二种最好的方法似乎是使用带有“键”的字典或哈希集作为要比较的字符串。

Makes for an interesting read.

读起来很有趣。