在C#中对小代码示例进行基准测试,这个实现可以改进吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1047218/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:56:26  来源:igfitidea点击:

Benchmarking small code samples in C#, can this implementation be improved?

c#.netperformanceprofiling

提问by Sam Saffron

Quite often on SO I find myself benchmarking small chunks of code to see which implemnetation is fastest.

我经常在 SO 上对小块代码进行基准测试,以查看哪种实现最快。

Quite often I see comments that benchmarking code does not take into account jitting or the garbage collector.

我经常看到基准测试代码没有考虑抖动或垃圾收集器的评论。

I have the following simple benchmarking function which I have slowly evolved:

我有以下我慢慢发展的简单基准测试功能:

  static void Profile(string description, int iterations, Action func) {
        // warm up 
        func();
        // clean up
        GC.Collect();

        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

Usage:

用法:

Profile("a descriptions", how_many_iterations_to_run, () =>
{
   // ... code being profiled
});

Does this implementation have any flaws? Is it good enough to show that implementaion X is faster than implementation Y over Z iterations? Can you think of any ways you would improve this?

这个实现有什么缺陷吗?是否足以证明在 Z 次迭代中实现 X 比实现 Y 快?你能想出什么方法来改善这一点吗?

EDITIts pretty clear that a time based approach (as opposed to iterations), is preferred, does anyone have any implementations where the time checks do not impact performance?

编辑很明显,基于时间的方法(而不是迭代)是首选,有没有人有任何时间检查不影响性能的实现?

采纳答案by Sam Saffron

Here is the modified function: as recommended by the community, feel free to amend this its a community wiki.

这是修改后的功能:根据社区的建议,请随时修改它的社区维基。

static double Profile(string description, int iterations, Action func) {
    //Run at highest priority to minimize fluctuations caused by other processes/threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    Thread.CurrentThread.Priority = ThreadPriority.Highest;

    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    return watch.Elapsed.TotalMilliseconds;
}

Make sure you compile in Release with optimizations enabled, and run the tests outside of Visual Studio. This last part is important because the JIT stints its optimizations with a debugger attached, even in Release mode.

确保在 Release 中编译并启用优化,并在 Visual Studio 之外运行测试。最后一部分很重要,因为即使在发布模式下,JIT 也会在附加调试器的情况下进行优化。

回答by Jonathan Rupp

If you want to take GC interactions out of the equation, you may want to run your 'warm up' call afterthe GC.Collect call, not before. That way you know .NET will already have enough memory allocated from the OS for the working set of your function.

如果您想将 GC 交互排除在外,您可能希望在 GC.Collect 调用之后而不是之前运行“热身”调用。这样你就知道 .NET 已经有足够的内存从操作系统分配给你的函数的工作集。

Keep in mind that you're making a non-inlined method call for each iteration, so make sure you compare the things you're testing to an empty body. You'll also have to accept that you can only reliably time things that are several times longer than a method call.

请记住,您正在为每次迭代进行非内联方法调用,因此请确保将您正在测试的内容与空主体进行比较。您还必须接受您只能可靠地为比方法调用长几倍的事情计时。

Also, depending on what kind of stuff you're profiling, you may want to do your timing based running for a certain amount of time rather than for a certain number of iterations -- it can tend to lead to more easily-comparable numbers without having to have a very short run for the best implementation and/or a very long one for the worst.

此外,根据您要分析的内容类型,您可能希望基于时间运行一定时间而不是一定次数的迭代——它可能会导致更容易比较的数字,而无需必须有一个非常短的运行才能获得最好的实现和/或一个非常长的实现最坏的运行。

回答by Alex Yakunin

I'd avoid passing the delegate at all:

我会避免通过委托:

  1. Delegate call is ~ virtual method call. Not cheap: ~ 25% of smallest memory allocation in .NET. If you're interested in details, see e.g. this link.
  2. Anonymous delegates may lead to usage of closures, that you won't even notice. Again, accessing closure fields is noticeably than e.g. accessing a variable on the stack.
  1. 委托调用是~虚方法调用。不便宜:~ .NET 中最小内存分配的 25%。如果您对详细信息感兴趣,请参见例如此链接
  2. 匿名委托可能会导致使用闭包,您甚至不会注意到。同样,访问闭包字段比访问堆栈上的变量明显。

An example code leading to closure usage:

导致闭包使用的示例代码:

public void Test()
{
  int someNumber = 1;
  Profiler.Profile("Closure access", 1000000, 
    () => someNumber + someNumber);
}

If you're not aware about closures, take a look at this method in .NET Reflector.

如果您不了解闭包,请查看 .NET Reflector 中的此方法。

回答by Alex Yakunin

You must also run a "warm up" pass prior to actual measurement to exclude the time JIT compiler spends on jitting your code.

您还必须在实际测量之前运行“热身”过程,以排除 JIT 编译器在 jitting 代码上花费的时间。

回答by Paul Alexander

I think the most difficult problem to overcome with benchmarking methods like this is accounting for edge cases and the unexpected. For example - "How do the two code snippets work under high CPU load/network usage/disk thrashing/etc." They're great for basic logic checks to see if a particular algorithm works significantlyfaster than another. But to properly test most code performance you'd have to create a test that measures the specific bottlenecks of that particular code.

我认为像这样的基准测试方法最难克服的问题是考虑边缘情况和意外情况。例如 - “这两个代码片段如何在高 CPU 负载/网络使用率/磁盘抖动/等情况下工作。” 它们非常适合用于基本逻辑检查,以查看特定算法的运行速度是否明显快于其他算法。但是要正确测试大多数代码性能,您必须创建一个测试来衡量该特定代码的特定瓶颈。

I'd still say that testing small blocks of code often has little return on investment and can encourage using overly complex code instead of simple maintainable code. Writing clear code that other developers, or myself 6 months down the line, can understand quickly will have more performance benefits than highly optimized code.

我仍然会说测试小块代码通常投资回报很少,并且会鼓励使用过于复杂的代码而不是简单的可维护代码。编写其他开发人员或我自己 6 个月后可以快速理解的清晰代码将比高度优化的代码具有更多的性能优势。

回答by LukeH

Finalisation won't necessarily be completed before GC.Collectreturns. The finalisation is queued and then run on a separate thread. This thread could still be active during your tests, affecting the results.

定稿不一定在GC.Collect返回之前完成。完成排队,然后在单独的线程上运行。该线程在您的测试期间可能仍处于活动状态,从而影响结果。

If you want to ensure that finalisation has completed before starting your tests then you might want to call GC.WaitForPendingFinalizers, which will block until the finalisation queue is cleared:

如果您想确保在开始测试之前完成完成,那么您可能需要调用GC.WaitForPendingFinalizers,它会阻塞直到完成队列被清除:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

回答by Alexey Romanov

I'd call func()several times for the warm-up, not just one.

我会func()多次打电话进行热身,而不仅仅是一次。

回答by Edward Brey

Depending on the code you are benchmarking and the platform it runs on, you may need to account for how code alignment affects performance. To do so would probably require a outer wrapper that ran the test multiple times (in separate app domains or processes?), some of the times first calling "padding code" to force it to be JIT compiled, so as to cause the code being benchmarked to be aligned differently. A complete test result would give the best-case and worst-case timings for the various code alignments.

根据您进行基准测试的代码及其运行的平台,您可能需要考虑代码对齐如何影响性能。这样做可能需要一个外部包装器来多次运行测试(在单独的应用程序域或进程中?),有时首先调用“填充代码”以强制它进行 JIT 编译,从而导致代码被以不同的方式对齐。完整的测试结果将给出各种代码对齐的最佳情况和最坏情况的时序。

回答by Joakim

Suggestions for improvement

改进建议

  1. Detecting if the execution environment is good for benchmarking (such as detecting if a debugger is attached or if jit optimization is disabled which would result in incorrect measurements).

  2. Measuring parts of the code independently (to see exactly where the bottleneck is).

  3. Comparing different versions/components/chunks of code (In your first sentence you say '... benchmarking small chunks of code to see which implementation is fastest.').
  1. 检测执行环境是否适合基准测试(例如检测是否连接了调试器或是否禁用了 jit 优化,这将导致不正确的测量)。

  2. 独立测量部分代码(以准确查看瓶颈所在)。

  3. 比较不同版本/组件/代码块(在你的第一句话中,你说'......对小块代码进行基准测试,看看哪个实现最快。')。

Regarding #1:

关于#1:

  • To detect if a debugger is attached, read the property System.Diagnostics.Debugger.IsAttached(Remember to also handle the case where the debugger is initially not attached, but is attached after some time).

  • To detect if jit optimization is disabled, read property DebuggableAttribute.IsJITOptimizerDisabledof the relevant assemblies:

    private bool IsJitOptimizerDisabled(Assembly assembly)
    {
        return assembly.GetCustomAttributes(typeof (DebuggableAttribute), false)
            .Select(customAttribute => (DebuggableAttribute) customAttribute)
            .Any(attribute => attribute.IsJITOptimizerDisabled);
    }
    
  • 要检测是否附加了调试器,请阅读该属性System.Diagnostics.Debugger.IsAttached(请记住还要处理调试器最初未附加但一段时间后附加的情况)。

  • 要检测是否禁用了 jit 优化,请读取DebuggableAttribute.IsJITOptimizerDisabled相关程序集的属性:

    private bool IsJitOptimizerDisabled(Assembly assembly)
    {
        return assembly.GetCustomAttributes(typeof (DebuggableAttribute), false)
            .Select(customAttribute => (DebuggableAttribute) customAttribute)
            .Any(attribute => attribute.IsJITOptimizerDisabled);
    }
    

Regarding #2:

关于#2:

This can be done in many ways. One way is to allow several delegates to be supplied and then measure those delegates individually.

这可以通过多种方式完成。一种方法是允许提供多个委托,然后分别测量这些委托。

Regarding #3:

关于#3:

This could also be done in many ways, and different use-cases would demand very different solutions. If the benchmark is invoked manually, then writing to the console might be fine. However if the benchmark is performed automatically by the build system, then writing to the console is probably not so fine.

这也可以通过多种方式完成,不同的用例需要非常不同的解决方案。如果手动调用基准测试,则写入控制台可能没问题。但是,如果基准测试是由构建系统自动执行的,那么写入控制台可能不是那么好。

One way to do this is to return the benchmark result as a strongly typed object that can easily be consumed in different contexts.

一种方法是将基准测试结果作为可以在不同上下文中轻松使用的强类型对象返回。



Etimo.Benchmarks

Etimo.Benchmarks

Another approach is to use an existing component to perform the benchmarks. Actually, at my company we decided to release our benchmark tool to public domain. At it's core, it manages the garbage collector, jitter, warmups etc, just like some of the other answers here suggest. It also has the three features I suggested above. It manages several of the issues discussed in Eric Lippert blog.

另一种方法是使用现有组件来执行基准测试。实际上,在我的公司,我们决定向公共领域发布我们的基准测试工具。它的核心是管理垃圾收集器、抖动、预热等,就像这里的其他一些答案所建议的那样。它还具有我上面建议的三个功能。它管理Eric Lippert 博客中讨论的几个问题。

This is an example output where two components are compared and the results are written to the console. In this case the two components compared are called 'KeyedCollection' and 'MultiplyIndexedKeyedCollection':

这是一个示例输出,其中比较了两个组件并将结果写入控制台。在这种情况下,比较的两个组件称为“KeyedCollection”和“MultiplyIndexedKeyedCollection”:

Etimo.Benchmarks - Sample Console Output

Etimo.Benchmarks - 示例控制台输出

There is a NuGet package, a sample NuGet packageand the source code is available at GitHub. There is also a blog post.

有一个NuGet 包、一个示例 NuGet 包和源代码可在GitHub 上获得。还有一篇博文

If you're in a hurry, I suggest you get the sample package and simply modify the sample delegates as needed. If you're not in a hurry, it might be a good idea to read the blog post to understand the details.

如果您赶时间,我建议您获取示例包并根据需要简单地修改示例委托。如果您不着急,阅读博客文章以了解详细信息可能是个好主意。

回答by Danny Tuppeny

If you're trying to eliminate Garbage Collection impact from the benchmark complete, is it worth setting GCSettings.LatencyMode?

如果您试图从基准测试中消除垃圾收集的影响,是否值得设置GCSettings.LatencyMode

If not, and you want the impact of garbage created in functo be part of the benchmark, then shouldn't you also force collection at the end of the test (inside the timer)?

如果没有,并且您希望将垃圾创建的影响func作为基准测试的一部分,那么您是否也应该在测试结束时(在计时器内)强制收集?