快速而肮脏的方式来配置代码
当我们要获取有关特定代码路径的性能数据时,使用哪种方法?
解决方案
回答
这种方法有一些局限性,但我仍然发现它非常有用。我会先列出限制(我知道),让任何想使用它的人自负风险。
- 我发布的原始版本在递归调用中花费了过多的时间(如答案注释中所指出)。
- 在我添加代码以忽略递归之前,它不是线程安全的,也不是线程安全的,现在它甚至更不安全了。
- 尽管多次调用(百万次)非常有效,但是它将对结果产生可测量的影响,因此我们测量的范围将比不使用的范围更长。
当手头的问题不能证明我对所有代码进行性能分析是合理的,或者我从要验证的探查器中获取了一些数据时,我就使用此类。基本上,它会汇总我们在特定块中花费的时间,并在程序结束时将其输出到调试流(可通过DbgView查看),包括执行代码的次数(以及平均花费的时间)。
#pragma once #include <tchar.h> #include <windows.h> #include <sstream> #include <boost/noncopyable.hpp> namespace scope_timer { class time_collector : boost::noncopyable { __int64 total; LARGE_INTEGER start; size_t times; const TCHAR* name; double cpu_frequency() { // cache the CPU frequency, which doesn't change. static double ret = 0; // store as double so devision later on is floating point and not truncating if (ret == 0) { LARGE_INTEGER freq; QueryPerformanceFrequency(&freq); ret = static_cast<double>(freq.QuadPart); } return ret; } bool in_use; public: time_collector(const TCHAR* n) : times(0) , name(n) , total(0) , start(LARGE_INTEGER()) , in_use(false) { } ~time_collector() { std::basic_ostringstream<TCHAR> msg; msg << _T("scope_timer> ") << name << _T(" called: "); double seconds = total / cpu_frequency(); double average = seconds / times; msg << times << _T(" times total time: ") << seconds << _T(" seconds ") << _T(" (avg ") << average <<_T(")\n"); OutputDebugString(msg.str().c_str()); } void add_time(__int64 ticks) { total += ticks; ++times; in_use = false; } bool aquire() { if (in_use) return false; in_use = true; return true; } }; class one_time : boost::noncopyable { LARGE_INTEGER start; time_collector* collector; public: one_time(time_collector& tc) { if (tc.aquire()) { collector = &tc; QueryPerformanceCounter(&start); } else collector = 0; } ~one_time() { if (collector) { LARGE_INTEGER end; QueryPerformanceCounter(&end); collector->add_time(end.QuadPart - start.QuadPart); } } }; } // Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number) #define TIME_THIS_SCOPE(name) \ static scope_timer::time_collector st_time_collector_##name(_T(#name)); \ scope_timer::one_time st_one_time_##name(st_time_collector_##name)
回答
好吧,我有两个代码段。在伪代码中,它们看起来像(它是简化版本,实际上我在使用QueryPerformanceFrequency):
第一个片段:
Timer timer = new Timer timer.Start
第二段:
timer.Stop show elapsed time
一点热键功夫,我可以说这段代码从我的CPU中偷走了多少时间。
回答
代码分析器和优化文章中有很多有关C ++代码概要分析的信息,并且还具有指向程序/类的免费下载链接,该链接将为我们显示不同代码路径/方法的图形表示。
回答
我通过创建两个类来完成我的配置文件:cProfile和cProfileManager。
cProfileManager将保存cProfile产生的所有数据。
具有以下要求的cProfile
:
cProfile
有一个构造函数来初始化当前时间。cProfile
具有一个解构函数,该函数将类存活的总时间发送给cProfileManager
。
要使用这些配置文件类,我首先创建cProfileManager
的实例。然后,将要分析的代码块放在花括号内。在花括号内,我创建一个cProfile
实例。当代码块结束时,cProfile
会将完成代码块所花费的时间发送给cProfileManager
。
范例程式码
这是代码示例(简化):
class cProfile { cProfile() { TimeStart = GetTime(); }; ~cProfile() { ProfileManager->AddProfile (GetTime() - TimeStart); } float TimeStart; }
要使用cProfile
,我会做这样的事情:
int main() { printf("Start test"); { cProfile Profile; Calculate(); } ProfileManager->OutputData(); }
或者这个:
void foobar() { cProfile ProfileFoobar; foo(); { cProfile ProfileBarCheck; while (bar()) { cProfile ProfileSpam; spam(); } } }
技术说明
这段代码实际上是对C ++中作用域,构造函数和反构造函数工作方式的滥用。 cProfile仅存在于块范围内(我们要测试的代码块)。一旦程序离开了块范围,cProfile
记录结果。
其他增强功能
- 我们可以将字符串参数添加到构造函数中,以便执行以下操作:cProfile Profile("复杂计算的配置文件");
- 我们可以使用宏使代码看起来更简洁(请注意不要滥用它。与我们对语言的其他滥用不同,使用宏可能会很危险)。示例:#define START_PROFILE cProfile Profile(); {#define END_PROFILE}
- cProfileManager可以检查代码块被调用了多少次。但是我们需要一个代码块标识符。第一个增强功能可以帮助识别块。如果要分析的代码在循环内(例如第二个示例aboe),这可能会很有用。我们还可以添加代码块花费的平均,最快和最长执行时间。
- 如果我们处于调试模式,请不要忘记添加检查以跳过分析。
回答
我有一个快速且肮脏的分析类,即使在最紧密的内部循环中也可以用于分析。重点在于极轻的重量和简单的代码。该类分配一个固定大小的二维数组。然后,我在各处添加"检查点"调用。当在检查点M之后立即到达检查点N时,我将经过的时间(以微秒为单位)添加到数组项[M,N]。由于这是为了分析紧密循环而设计的,所以我也有"迭代开始"调用,它可以重置"最后一个检查点"变量。在测试结束时,dumpResults()
调用将生成紧随其后的所有检查点对的列表,以及已计和未计的总时间。
回答
请注意,以下所有内容都是专门为Windows编写的。
我还编写了一个计时器类,以使用QueryPerformanceCounter()进行高精度的性能分析,以获取高精度时序,但略有不同。当Timer对象超出范围时,我的计时器类不会转储经过的时间。而是将经过的时间累积到一个集合中。我添加了一个静态成员函数Dump(),该函数创建一个经过时间表,按计时类别(在Timer的构造函数中指定为字符串)进行排序,并进行一些统计分析,例如平均经过时间,标准偏差,最大值和最小值。我还添加了一个Clear()静态成员函数,该函数清除集合并让我们重新开始。
如何使用Timer类(伪代码):
int CInsertBuffer::Read(char* pBuf) { // TIMER NOTES: Avg Execution Time = ~1 ms Timer timer("BufferRead"); : : return -1; }
输出示例:
Timer Precision = 418.0095 ps === Item Trials Ttl Time Avg Time Mean Time StdDev === AddTrade 500 7 ms 14 us 12 us 24 us BufferRead 511 1:19.25 0.16 s 621 ns 2.48 s BufferWrite 516 511 us 991 ns 482 ns 11 us ImportPos Loop 1002 18.62 s 19 ms 77 us 0.51 s ImportPosition 2 18.75 s 9.38 s 16.17 s 13.59 s Insert 515 4.26 s 8 ms 5 ms 27 ms recv 101 18.54 s 0.18 s 2603 ns 1.63 s
文件Timer.inl:
#include <map> #include "x:\utils\stlext\stringext.h" #include <iterator> #include <set> #include <vector> #include <numeric> #include "x:\utils\stlext\algorithmext.h" #include <math.h> class Timer { public: Timer(const char* name) { label = std::safe_string(name); QueryPerformanceCounter(&startTime); } virtual ~Timer() { QueryPerformanceCounter(&stopTime); __int64 clocks = stopTime.QuadPart-startTime.QuadPart; double elapsed = (double)clocks/(double)TimerFreq(); TimeMap().insert(std::make_pair(label,elapsed)); }; static std::string Dump(bool ClipboardAlso=true) { static const std::string loc = "Timer::Dump"; if( TimeMap().empty() ) { return "No trials\r\n"; } std::string ret = std::formatstr("\r\n\r\nTimer Precision = %s\r\n\r\n", format_elapsed(1.0/(double)TimerFreq()).c_str()); // get a list of keys typedef std::set<std::string> keyset; keyset keys; std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key()); size_t maxrows = 0; typedef std::vector<std::string> strings; strings lines; static const size_t tabWidth = 9; std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2, "Item", tabWidth, tabWidth, "Trials", tabWidth, tabWidth, "Ttl Time", tabWidth, tabWidth, "Avg Time", tabWidth, tabWidth, "Mean Time", tabWidth, tabWidth, "StdDev"); ret += std::formatstr("\r\n%s\r\n", head.c_str()); if( ClipboardAlso ) lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n"); // dump the values for each key {for( keyset::iterator key = keys.begin(); keys.end() != key; ++key ) { time_type ttl = 0; ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key)); size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key)); if( num > maxrows ) maxrows = num; time_type avg = ttl / num; // compute mean std::vector<time_type> sortedTimes; std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key)); std::sort(sortedTimes.begin(), sortedTimes.end()); size_t mid = (size_t)floor((double)num/2.0); double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid]; // compute variance double sum = 0.0; if( num > 1 ) { for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt ) sum += pow(*timeIt-mean,2.0); } // compute std dev double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0; ret += std::formatstr(" %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s\r\n", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str()); if( ClipboardAlso ) lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s\r\n", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str())); } } ret += std::formatstr("%s\r\n", std::string(head.length(),'=').c_str()); if( ClipboardAlso ) { // dump header row of data block lines.push_back(""); { std::string s; for( keyset::iterator key = keys.begin(); key != keys.end(); ++key ) { if( key != keys.begin() ) s.append("\t"); s.append(*key); } s.append("\r\n"); lines.push_back(s); } // blow out the flat map of time values to a seperate vector of times for each key typedef std::map<std::string, std::vector<time_type> > nodematrix; nodematrix nodes; for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time ) nodes[time->first].push_back(time->second); // dump each data point for( size_t row = 0; row < maxrows; ++row ) { std::string rowDump; for( keyset::iterator key = keys.begin(); key != keys.end(); ++key ) { if( key != keys.begin() ) rowDump.append("\t"); if( nodes[*key].size() > row ) rowDump.append(std::formatstr("%f", nodes[*key][row])); } rowDump.append("\r\n"); lines.push_back(rowDump); } // dump to the clipboard std::string dump; for( strings::iterator s = lines.begin(); s != lines.end(); ++s ) { dump.append(*s); } OpenClipboard(0); EmptyClipboard(); HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1); if( hg != 0 ) { char* buf = (char*)GlobalLock(hg); if( buf != 0 ) { std::copy(dump.begin(), dump.end(), buf); buf[dump.length()] = 0; GlobalUnlock(hg); SetClipboardData(CF_TEXT, hg); } } CloseClipboard(); } return ret; } static void Reset() { TimeMap().clear(); } static std::string format_elapsed(double d) { if( d < 0.00000001 ) { // show in ps with 4 digits return std::formatstr("%0.4f ps", d * 1000000000000.0); } if( d < 0.00001 ) { // show in ns return std::formatstr("%0.0f ns", d * 1000000000.0); } if( d < 0.001 ) { // show in us return std::formatstr("%0.0f us", d * 1000000.0); } if( d < 0.1 ) { // show in ms return std::formatstr("%0.0f ms", d * 1000.0); } if( d <= 60.0 ) { // show in seconds return std::formatstr("%0.2f s", d); } if( d < 3600.0 ) { // show in min:sec return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0)); } // show in h:min:sec return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0)); } private: static __int64 TimerFreq() { static __int64 freq = 0; static bool init = false; if( !init ) { LARGE_INTEGER li; QueryPerformanceFrequency(&li); freq = li.QuadPart; init = true; } return freq; } LARGE_INTEGER startTime, stopTime; std::string label; typedef std::string key_type; typedef double time_type; typedef std::multimap<key_type, time_type> Times; // static Times times; static Times& TimeMap() { static Times times_; return times_; } struct extract_key : public std::unary_function<Times::value_type, key_type> { std::string operator()(Times::value_type const & r) const { return r.first; } }; struct extract_val : public std::unary_function<Times::value_type, time_type> { time_type operator()(Times::value_type const & r) const { return r.second; } }; struct match_key : public std::unary_function<Times::value_type, bool> { match_key(key_type const & key_) : key(key_) {}; bool operator()(Times::value_type const & rhs) const { return key == rhs.first; } private: match_key& operator=(match_key&) { return * this; } const key_type key; }; struct accum_key : public std::binary_function<time_type, Times::value_type, time_type> { accum_key(key_type const & key_) : key(key_), n(0) {}; time_type operator()(time_type const & v, Times::value_type const & rhs) const { if( key == rhs.first ) { ++n; return rhs.second + v; } return v; } private: accum_key& operator=(accum_key&) { return * this; } const Times::key_type key; mutable size_t n; }; };
文件stringext.h(提供formatstr()函数):
namespace std { /* --- Formatted Print template<class C> int strprintf(basic_string<C>* pString, const C* pFmt, ...); template<class C> int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args); Returns : # characters printed to output Effects : Writes formatted data to a string. strprintf() works exactly the same as sprintf(); see your documentation for sprintf() for details of peration. vstrprintf() also works the same as sprintf(), but instead of accepting a variable paramater list it accepts a va_list argument. Requires : pString is a pointer to a basic_string<> --- */ template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr); template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr) { # ifdef SECURE_VSPRINTF return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr); # else return _vsnprintf(buffer, bufferSize-1, format, argptr); # endif } template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr) { # ifdef SECURE_VSPRINTF return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr); # else return _vsnwprintf(buffer, bufferSize-1, format, argptr); # endif } template<class Type, class Traits> inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args) { // prologue static const size_t ChunkSize = 1024; size_t curBufSize = 0; outStr.erase(); if( !format ) { return 0; } // keep trying to write the string to an ever-increasing buffer until // either we get the string written or we run out of memory while( bool cont = true ) { // allocate a local buffer curBufSize += ChunkSize; std::ref_ptr<Type> localBuffer = new Type[curBufSize]; if( localBuffer.get() == 0 ) { // we ran out of memory -- nice goin'! return -1; } // format output to local buffer int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args); if( -1 == i ) { // the buffer wasn't big enough -- try again continue; } else if( i < 0 ) { // something wierd happened -- bail return i; } // if we get to this point the string was written completely -- stop looping outStr.assign(localBuffer.get(),i); return i; } // unreachable code return -1; }; // provided for backward-compatibility template<class Type, class Traits> inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args) { return vstringprintf(*outStr, format, args); } template<class Char, class Traits> inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...) { va_list args; va_start(args, format); int retval = vstringprintf(outString, format, args); va_end(args); return retval; } // old function provided for backward-compatibility template<class Char, class Traits> inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...) { va_list args; va_start(args, format); int retval = vstringprintf(*outString, format, args); va_end(args); return retval; } /* --- Inline Formatted Print string strprintf(const char* Format, ...); Returns : Formatted string Effects : Writes formatted data to a string. formatstr() works the same as sprintf(); see your documentation for sprintf() for details of operation. --- */ template<class Char> inline std::basic_string<Char> formatstr(const Char * format, ...) { std::string outString; va_list args; va_start(args, format); vstringprintf(outString, format, args); va_end(args); return outString; } };
文件algorithmext.h(提供transform_if()函数):
/* --- Transform 25.2.3 template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred) template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred) Requires: T is of type EqualityComparable (20.1.1) op and binary_op have no side effects Effects : Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of: 1: op( *(first1 + (i - result)) 2: binary_op( *(first1 + (i - result), *(first2 + (i - result)) Returns : result + (last1 - first1) Complexity : At most last1 - first1 applications of op or binary_op --- */ template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate> OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryFunction f, Predicate pred) { for (; first != last; ++first) { if( pred(*first) ) *result++ = f(*first); } return result; } template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate> OutputIterator transform_if(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, OutputIterator result, BinaryOperation binary_op, Predicate pred) { for (; first1 != last1 ; ++first1, ++first2) { if( pred(*first1) ) *result++ = binary_op(*first1,*first2); } return result; }