C# For vs. Linq - 性能 vs. 未来

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14893924/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 13:22:00  来源:igfitidea点击:

For vs. Linq - Performance vs. Future

c#performancelinq

提问by Jaqq

Very brief question. I have a randomly sorted large string array (100K+ entries) where I want to find the first occurance of a desired string. I have two solutions.

很简短的问题。我有一个随机排序的大字符串数组(100K+ 个条目),我想在其中找到所需字符串的第一次出现。我有两个解决方案。

From having read what I can my guess is that the 'for loop' is going to currently give slightly better performance (but this margin could always change), but I also find the linq version much more readable. On balance which method is generally considered current best coding practice and why?

通过阅读我可以猜测的是,“for 循环”目前将提供稍微更好的性能(但这个边距总是会改变),但我也发现 linq 版本更具可读性。总的来说,哪种方法通常被认为是当前的最佳编码实践,为什么?

string matchString = "dsf897sdf78";
int matchIndex = -1;
for(int i=0; i<array.length; i++)
{
    if(array[i]==matchString)
    {
        matchIndex = i;
        break;
    }
}

or

或者

int matchIndex = array.Select((r, i) => new { value = r, index = i })
                         .Where(t => t.value == matchString)
                         .Select(s => s.index).First();

采纳答案by usr

The best practice depends on what you need:

最佳实践取决于您的需求:

  1. Development speed and maintainability: LINQ
  2. Performance (according to profiling tools): manual code
  1. 开发速度和可维护性:LINQ
  2. 性能(根据分析工具):手动代码

LINQ really does slow things down with all the indirection. Don't worry about it as 99% of your code does not impact end user performance.

LINQ 确实通过所有间接方式减慢了速度。不要担心,因为 99% 的代码不会影响最终用户的性能。

I started with C++ and really learnt how to optimize a piece of code. LINQ is not suited to get the most out of your CPU. So if you measure a LINQ query to be a problem just ditch it. But only then.

我从 C++ 开始,真正学会了如何优化一段代码。LINQ 不适合充分利用您的 CPU。因此,如果您认为 LINQ 查询是一个问题,请放弃它。但只有那时。

For your code sample I'd estimate a 3x slowdown. The allocations (and subsequent GC!) and indirections through the lambdas really hurt.

对于您的代码示例,我估计速度会降低 3 倍。通过 lambda 的分配(以及随后的 GC!)和间接访问真的很伤人。

回答by dutzu

Well, you gave the answer to your question yourself.

嗯,你自己回答了你的问题。

Go with a Forloop if you want the best performance, or go with Linqif you want readability.

与去For,如果你想要最好的性能,或者去循环Linq,如果你想可读性。

Also perhaps keep in mind the possibility of using Parallel.Foreach() which would benefit from in-line lambda expressions (so, more closer to Linq), and that is much more readable then doing paralelization "manually".

也可能要记住使用 Parallel.Foreach() 的可能性,这将受益于内嵌 lambda 表达式(因此,更接近 Linq),并且比“手动”进行并行化更具可读性。

回答by Lee Dale

I don't think either is considered best practice some people prefer looking at LINQ and some don't.

我认为这两种方法都不是最佳实践,有些人更喜欢查看 LINQ,有些人则不喜欢。

If performance is a issue the I would profile both bits of code for your scenario and if the difference is negligible then go with the one you feel more conformable with, after all it will most likely be you who maintains the code.

如果性能是一个问题,我会为您的场景分析两部分代码,如果差异可以忽略不计,那么选择您觉得更合适的代码,毕竟维护代码的人很可能是您。

Also have you thought about using PLINQ or making the loop run in parallel?

您是否也考虑过使用 PLINQ 或使循环并行运行?

回答by Sergey Berezovskiy

There is always dilemma between performance and maintainability. And usually (if there is no specific requirements about performance) maintainability should win. Only if you have performance problems, then you should profile application, find problem source, and improve its performance (by reducing maintainability at same time, yes that's the world we live in).

在性能和可维护性之间总是存在两难选择。通常(如果对性能没有特定要求)可维护性应该获胜。只有当你有性能问题时,你才应该分析应用程序,找到问题源,并提高它的性能(同时降低可维护性,是的,这就是我们生活的世界)。

About your sample. Linq is not very good solution here, because it do not add match maintainability into your code. Actually for me projecting, filtering, and projecting again looks even worse, than simple loop. What you need here is simple Array.IndexOf, which is more maintainable, than loop, and have almost same performance:

关于您的样品。Linq 在这里不是很好的解决方案,因为它不会将匹配可维护性添加到您的代码中。实际上对我来说,投影、过滤和再次投影看起来比简单的循环更糟糕。这里你需要的是简单的 Array.IndexOf,它比循环更易于维护,并且具有几乎相同的性能:

Array.IndexOf(array, matchString)

回答by Matthew Watson

Slightlybetter performance? A loop will give SIGNIFICANTLY better performance!

性能稍微好一点?循环将显着提高性能!

Consider the code below. On my system for a RELEASE (not debug) build, it gives:

考虑下面的代码。在我的 RELEASE(非调试)构建系统上,它提供:

Found via loop at index 999999 in 00:00:00.2782047
Found via linq at index 999999 in 00:00:02.5864703
Loop was 9.29700432810805 times faster than linq.

The code is deliberately set up so that the item to be found is right at the end. If it was right at the start, things would be quite different.

代码是故意设置的,以便要找到的项目正好在最后。如果一开始是对的,事情就会大不相同。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    public static class Program
    {
        private static void Main(string[] args)
        {
            string[] a = new string[1000000];

            for (int i = 0; i < a.Length; ++i)
            {
                a[i] = "Won't be found";
            }

            string matchString = "Will be found";

            a[a.Length - 1] = "Will be found";

            const int COUNT = 100;

            var sw = Stopwatch.StartNew();
            int matchIndex = -1;

            for (int outer = 0; outer < COUNT; ++outer)
            {
                for (int i = 0; i < a.Length; i++)
                {
                    if (a[i] == matchString)
                    {
                        matchIndex = i;
                        break;
                    }
                }
            }

            sw.Stop();
            Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);
            double loopTime = sw.Elapsed.TotalSeconds;

            sw.Restart();

            for (int outer = 0; outer < COUNT; ++outer)
            {
                matchIndex = a.Select((r, i) => new { value = r, index = i })
                             .Where(t => t.value == matchString)
                             .Select(s => s.index).First();
            }

            sw.Stop();
            Console.WriteLine("Found via linq at index " + matchIndex + " in " + sw.Elapsed);
            double linqTime = sw.Elapsed.TotalSeconds;

            Console.WriteLine("Loop was {0} times faster than linq.", linqTime/loopTime);
        }
    }
}

回答by Ryszard D?egan

LINQ, according to declarative paradigm, expresses the logic of a computation without describing its control flow. The query is goal oriented, selfdescribing and thus easy to analyse and understand. Is also concise. Moreover, using LINQ, one depends highly upon abstraction of data structure. That involves high rate of maintanability and reusability.

根据声明式范式,LINQ 表达了计算的逻辑,而不描述其控制流。该查询是面向目标的、自描述的,因此易于分析和理解。也是简洁。此外,使用 LINQ,高度依赖于数据结构的抽象。这涉及高可维护性和可重用性。

Iteration aproach addresses imperative paradigm. It gives fine-grained control, thus ease obtain higher performance. The code is also simpler to debug. Sometimes well contructed iteration is more readable than query.

迭代方法解决了命令式范式。它提供了细粒度的控制,从而轻松获得更高的性能。代码也更易于调试。有时,结构良好的迭代比查询更具可读性。

回答by Nachiket Saggam

The Best Option Is To Use IndexOf method of Array Class. Since it is specialized for arrays it will b significantly faster than both Linq and For Loop. Improving on Matt Watsons Answer.

最好的选择是使用 Array 类的 IndexOf 方法。由于它专门用于数组,因此比 Linq 和 For 循环要快得多。改进 Matt Watsons 的答案。

using System;
using System.Diagnostics;
using System.Linq;


namespace PerformanceConsoleApp
{
    public class LinqVsFor
    {

        private static void Main(string[] args)
        {
            string[] a = new string[1000000];

            for (int i = 0; i < a.Length; ++i)
            {
                a[i] = "Won't be found";
            }

            string matchString = "Will be found";

            a[a.Length - 1] = "Will be found";

            const int COUNT = 100;

            var sw = Stopwatch.StartNew();

            Loop(a, matchString, COUNT, sw);

            First(a, matchString, COUNT, sw);


            Where(a, matchString, COUNT, sw);

            IndexOf(a, sw, matchString, COUNT);

            Console.ReadLine();
        }

        private static void Loop(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            int matchIndex = -1;
            for (int outer = 0; outer < COUNT; ++outer)
            {
                for (int i = 0; i < a.Length; i++)
                {
                    if (a[i] == matchString)
                    {
                        matchIndex = i;
                        break;
                    }
                }
            }

            sw.Stop();
            Console.WriteLine("Found via loop at index " + matchIndex + " in " + sw.Elapsed);

        }

        private static void IndexOf(string[] a, Stopwatch sw, string matchString, int COUNT)
        {
            int matchIndex = -1;
            sw.Restart();
            for (int outer = 0; outer < COUNT; ++outer)
            {
                matchIndex = Array.IndexOf(a, matchString);
            }
            sw.Stop();
            Console.WriteLine("Found via IndexOf at index " + matchIndex + " in " + sw.Elapsed);

        }

        private static void First(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            sw.Restart();
            string str = "";
            for (int outer = 0; outer < COUNT; ++outer)
            {
                str = a.First(t => t == matchString);

            }
            sw.Stop();
            Console.WriteLine("Found via linq First at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);

        }

        private static void Where(string[] a, string matchString, int COUNT, Stopwatch sw)
        {
            sw.Restart();
            string str = "";
            for (int outer = 0; outer < COUNT; ++outer)
            {
                str = a.Where(t => t == matchString).First();

            }
            sw.Stop();
            Console.WriteLine("Found via linq Where at index " + Array.IndexOf(a, str) + " in " + sw.Elapsed);

        }

    }

}

Output:

输出:

Found via loop at index 999999 in 00:00:01.1528531
Found via linq First at index 999999 in 00:00:02.0876573
Found via linq Where at index 999999 in 00:00:01.3313111
Found via IndexOf at index 999999 in 00:00:00.7244812

回答by Paul Westcott

A bit of a non-answer, and really just an extension to https://stackoverflow.com/a/14894589, but I have, on and off, been working on an API-compatible replacement for Linq-to-Objects for a while now. It still doesn't provide the performance of a hand-coded loop, but it is faster for many (most?) linq scenarios. It does create more garbage, and has some slightly heavier up front costs.

有点没有答案,实际上只是对https://stackoverflow.com/a/14894589的扩展,但我一直在研究与 API 兼容的 Linq-to-Objects 替代品而现在。它仍然不提供手动编码循环的性能,但对于许多(大多数?)linq 场景来说它更快。它确实会产生更多垃圾,并且前期成本稍高。

The code is available https://github.com/manofstick/Cistern.Linq

代码可用https://github.com/manofstick/Cistern.Linq

A nuget package is available https://www.nuget.org/packages/Cistern.Linq/(I can't claim this to be battle hardened, use at your own risk)

nuget 包可用https://www.nuget.org/packages/Cistern.Linq/(我不能声称这是经过战斗的,使用风险自负)

Taking the code from Matthew Watson's answer (https://stackoverflow.com/a/14894589) with two slight tweaks, and we get the time down to "only" ~3.5 time worse than the hand-coded loop. On my machine it take about 1/3 of the time of original System.Linq version.

从 Matthew Watson 的答案 ( https://stackoverflow.com/a/14894589) 中提取代码并稍作调整,我们将时间缩短到“仅”比手动编码循环差 3.5 倍。在我的机器上,它大约需要原始 System.Linq 版本时间的 1/3。

The two changes to replace:

要替换的两个更改:

using System.Linq;

...

matchIndex = a.Select((r, i) => new { value = r, index = i })
             .Where(t => t.value == matchString)
             .Select(s => s.index).First();

With the following:

具有以下内容:

// a complete replacement for System.Linq
using Cistern.Linq;

...

// use a value tuple rather than anonymous type
matchIndex = a.Select((r, i) => (value: r, index: i))
             .Where(t => t.value == matchString)
             .Select(s => s.index).First();

So the library itself is a work in progress. It fails a couple of edge cases from the corefx's System.Linq test suite. It also still needs a few functions to be converted over (they currently have the corefx System.Linq implementation, which is compatible from an API perspective, if not a performance perspective). But anymore who wants to help, comment, etc would be appreciated....

所以图书馆本身是一项正在进行的工作。它未能通过 corefx 的 System.Linq 测试套件中的几个边缘情况。它还需要一些函数进行转换(它们目前具有 corefx System.Linq 实现,从 API 角度来看是兼容的,如果不是从性能角度来看)。但是,任何想要帮助、评论等的人将不胜感激....

回答by Kevin Waltman

Just an interesting observation. LINQ Lambda queries for sure add a penalty over LINQ Where queries or a For Loop. In the following code, it fills a list with 1000001 multi-parameter objects and then searches for a specific item that in this test will always be the last one, using a LINQ Lamba, a LINQ Where Query and a For Loop. Each test iterates 100 times and then averages the times to get the results.

只是一个有趣的观察。LINQ Lambda 查询肯定会增加对 LINQ Where 查询或 For 循环的惩罚。在下面的代码中,它用 1000001 个多参数对象填充列表,然后使用 LINQ Lamba、LINQ Where Query 和 For 循环搜索在此测试中始终是最后一个的特定项目。每个测试迭代 100 次,然后对这些时间进行平均以获得结果。

LINQ Lambda Query Average Time: 0.3382 seconds

LINQ Lambda 查询平均时间:0.3382 秒

LINQ Where Query Average Time: 0.238 seconds

LINQ Where 查询平均时间:0.238 秒

For Loop Average Time: 0.2266 seconds

For 循环平均时间:0.2266 秒

I've run this test over and over, and even increase the iteration and the spread is pretty much identical statistically speaking. Sure we are talking 1/10 of a second for essentially that a million item search. So in the real world, unless something is that intensive, not sure you would even notice. But if you do the LINQ Lambda vs LINQ Where query does have a difference in performance. The LINQ Where is near the same as the For Loop.

我一遍又一遍地运行这个测试,甚至增加迭代,从统计上讲,传播几乎是相同的。当然,我们说的是 1/10 秒,本质上是一百万个项目搜索。所以在现实世界中,除非有那么密集的事情,否则不确定你是否会注意到。但是,如果您执行 LINQ Lambda 与 LINQ Where 查询,则性能确实存在差异。LINQ Where 与 For 循环几乎相同。

private void RunTest()
{
    try
    {
        List<TestObject> mylist = new List<TestObject>();

        for (int i = 0; i <= 1000000; i++)
        {
            TestObject testO = new TestObject(string.Format("Item{0}", i), 1, Guid.NewGuid().ToString());
            mylist.Add(testO);
        }


        mylist.Add(new TestObject("test", "29863", Guid.NewGuid().ToString()));

        string searchtext = "test";

        int iterations = 100;

        // Linq Lambda Test
        List<int> list1 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            TestObject t = mylist.FirstOrDefault(q => q.Name == searchtext);
            int diff = (DateTime.Now - starttime).Milliseconds;
            list1.Add(diff);
        }

        // Linq Where Test
        List<int> list2 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            TestObject t = (from testO in mylist
                            where testO.Name == searchtext
                            select testO).FirstOrDefault();
            int diff = (DateTime.Now - starttime).Milliseconds;
            list2.Add(diff);
        }

        // For Loop Test
        List<int> list3 = new List<int>();
        for (int i = 1; i <= iterations; i++)
        {
            DateTime starttime = DateTime.Now;
            foreach (TestObject testO in mylist)
            {
                if (testO.Name == searchtext)
                {
                    TestObject t = testO;
                    break;
                }
            }
            int diff = (DateTime.Now - starttime).Milliseconds;
            list3.Add(diff);
        }

        float diff1 = list1.Average();
        Debug.WriteLine(string.Format("LINQ Lambda Query Average Time: {0} seconds", diff1 / (double)100));

        float diff2 = list2.Average();
        Debug.WriteLine(string.Format("LINQ Where Query Average Time: {0} seconds", diff2 / (double)100));

        float diff3 = list3.Average();
        Debug.WriteLine(string.Format("For Loop Average Time: {0} seconds", diff3 / (double)100));
    }
    catch (Exception ex)
    {
        Debug.WriteLine(ex.ToString());
    }
}

private class TestObject
{
    public TestObject(string _name, string _value, string _guid)
    {
        Name = _name;
        Value = _value;
        GUID = _guid;
    }
    public string Name;
    public string Value;
    public string GUID;
}