windows 推荐的开源分析器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/860602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Recommended Open Source Profilers
提问by stanigator
I'm trying to find open source profilers rather than using one of the commercial profilers which I have to pay $$$ for. When I performed a search on SourceForge, I have come across these four C++ profilers that I thought were quite promising:
我试图找到开源分析器,而不是使用我必须支付 $$$ 的商业分析器之一。当我在 SourceForge 上进行搜索时,我遇到了这四个我认为很有前途的 C++ 分析器:
- Shiny: C++ Profiler
- Low Fat Profiler
- Luke Stackwalker
- FreeProfiler
- 闪亮:C++ 分析器
- 低脂分析仪
- 卢克·斯塔克沃克
- FreeProfiler
I'm not sure which one of the profilers would be the best one to use in terms of learning about the performance of my program. It would be great to hear some suggestions.
我不确定在了解我的程序的性能方面,哪一种分析器最适合使用。很高兴听到一些建议。
采纳答案by Michael
You could try Windows Performance Toolkit. Completely free to use. This blog entryhas an example of how to do sample-based profiling.
你可以试试Windows Performance Toolkit。完全免费使用。此博客条目有一个示例,说明如何进行基于样本的分析。
回答by Larry Gritz
- Valgrind(And related tools like cachegrind, etc.)
- Google performance tools
回答by Mike Dunlavey
There's more than one way to do it.
有不止一种方法可以做到。
Don't forget the no-profiler method.
Most profilers assume you need 1) high statistical precision of timing (lots of samples), and 2) low precision of problem identification (functions & call-graphs).
大多数分析器假设您需要 1) 计时的高统计精度(大量样本),以及 2)问题识别(函数和调用图)的低精度。
Those priorities can be reversed. I.e. the problem can be located to the precise machine address, while cost precision is a function of the number of samples.
这些优先事项可以颠倒过来。即问题可以定位到精确的机器地址,而成本精度是样本数量的函数。
Most real problems cost at least 10%, where high precision is not essential.
大多数实际问题的成本至少为 10%,而高精度并不重要。
Example: If something is making your program take 2 times as long as it should, that means there is some code in it that costs 50%. If you take 10 samples of the call stack while it is being slow, the precise line(s) of code will be present on roughly 5 of them. The larger the program is, the more likely the problem is a function call somewhere mid-stack.
示例:如果某件事使您的程序需要花费 2 倍的时间,这意味着其中有一些代码需要花费 50%。如果您在调用堆栈缓慢时采集 10 个样本,则精确的代码行将出现在其中大约 5 行上。程序越大,问题越有可能是堆栈中间某处的函数调用。
It's counter-intuiitive, I know.
这是违反直觉的,我知道。
NOTE: xPerf is nearly there, but not quite (as far as I can tell). It takes samples of the call stack and saves them - that's good. Here's what I think it needs:
注意:xPerf 快到了,但还不够(据我所知)。它获取调用堆栈的样本并保存它们 - 这很好。这是我认为它需要的:
It should only take samples when you want them. As it is, you have to filter out the irrelevant ones.
In the stack view it should show specific lines or addresses at which calls take place, not just whole functions. (Maybe it can do this, I couldn't tell from the blog.)
If you click to get the butterfly view, centered on a single call instruction, or leaf instruction, it should show you not the CPU fraction, but the fraction of stack samples containing that instruction. That would be a direct measure of the cost of that instruction, as a fraction of time. (Maybe it can do this, I couldn't tell.) So, for example, even if an instruction were a call to file-open or something else that idles the thread, it still costs wall clock time, and you need to know that.
它应该只在你需要的时候取样。事实上,你必须过滤掉不相关的。
在堆栈视图中,它应该显示调用发生的特定行或地址,而不仅仅是整个函数。(也许它可以做到这一点,我从博客上看不出来。)
如果您单击以获取以单个调用指令或叶指令为中心的蝴蝶视图,它应该显示的不是 CPU 分数,而是包含该指令的堆栈样本的分数。这将是该指令成本的直接衡量标准,作为时间的一小部分。(也许它可以做到这一点,我不知道。)因此,例如,即使指令是对文件打开的调用或其他使线程空闲的指令,它仍然需要挂钟时间,您需要知道那。
NOTE: I just looked over Luke Stackwalker, and the same remarks apply. I think it is on the right track but needs UI work.
注意:我刚刚查看了 Luke Stackwalker,同样的评论也适用。我认为它在正确的轨道上,但需要 UI 工作。
ADDED: Having looked over LukeStackwalker more carefully, I'm afraid it falls victim to the assumption that measuring functions is more important than locating statements. So on each sample of the call stack, it updates the function-level timing info, but all it does with the line-number info is keep track of min and max line numbers in each function, which, the more samples it takes, the farther apart those get. So it basically throws away the most important information - the line number information. The reason that is important is that if you decide to optimize a function, you need to know which lines in it need work, and those lines were on the stack samples (before they were discarded).
补充:更仔细地查看了 LukeStackwalker,我担心它会成为测量函数比定位语句更重要的假设的牺牲品。因此,在调用堆栈的每个样本上,它都会更新函数级计时信息,但它对行号信息所做的只是跟踪每个函数中的最小和最大行号,它需要的样本越多,离得越远。所以它基本上扔掉了最重要的信息——行号信息。重要的原因是,如果您决定优化一个函数,您需要知道其中的哪些行需要工作,以及这些行在堆栈样本中(在它们被丢弃之前)。
One might object that if the line number information were retained it would run out of storage quickly. Two answers. 1) There are only so many lines that show up on the samples, and they show up repeatedly. 2) Not so many samples are needed - the assumption that high statistical precision of measurement is necessary has always been assumed, but never justified.
有人可能会反对,如果保留行号信息,它将很快耗尽存储空间。两个答案。1)样本上出现的线条只有这么多,而且重复出现。2) 不需要那么多样本——测量的高统计精度是必要的假设一直被假设,但从未被证明是正确的。
I suspect other stack samplers, like xPerf, have similar issues.
我怀疑其他堆栈采样器,如 xPerf,也有类似的问题。
回答by Soo Wei Tan
It's not open source, but AMD CodeAnalystis free. It also works on Intel CPUs despite the name. There are versions available for both Windows (with Visual Studio integration) and Linux.
它不是开源的,但AMD CodeAnalyst是免费的。尽管名称如此,它也适用于 Intel CPU。有适用于 Windows(与 Visual Studio 集成)和 Linux 的版本。
回答by Suma
From those who have listed, I have found Luke Stackwalker to work best - I liked its GUI, it was easy to get running.
从列出的那些人中,我发现 Luke Stackwalker 工作得最好——我喜欢它的 GUI,它很容易上手。
Other similar is Very Sleepy- similar functionality, sampling seems more reliable, GUI perhaps a little bit harder to use (not that graphical).
其他类似的是非常困- 类似的功能,采样似乎更可靠,GUI 可能有点难以使用(不是图形化)。
After spending some more time with them, I have found one quite important drawback. While both try to sample at 1 ms resolution, in practice they do not achieve it because their sampling method (StackWalk64 of the attached process) is way too slow. For my application it takes something like 5-20 ms to get a callstack. Not only this makes your results imprecise, it also makes them skewed, as short callstacks are walked faster, therefore tend to get more hits.
在与他们相处了一段时间之后,我发现了一个非常重要的缺点。虽然两者都尝试以 1 毫秒的分辨率进行采样,但实际上他们没有实现,因为他们的采样方法(附加过程的 StackWalk64)太慢了。对于我的应用程序,获取调用堆栈大约需要 5-20 毫秒。这不仅会使您的结果不精确,还会使它们倾斜,因为短调用堆栈走得更快,因此往往会获得更多点击。