快速而肮脏的方式来配置代码

时间:2020-03-05 18:52:58  来源:igfitidea点击:

当我们要获取有关特定代码路径的性能数据时,使用哪种方法?

解决方案

回答

这种方法有一些局限性,但我仍然发现它非常有用。我会先列出限制(我知道),让任何想使用它的人自负风险。

  • 我发布的原始版本在递归调用中花费了过多的时间(如答案注释中所指出)。
  • 在我添加代码以忽略递归之前,它不是线程安全的,也不是线程安全的,现在它甚至更不安全了。
  • 尽管多次调用(百万次)非常有效,但是它将对结果产生可测量的影响,因此我们测量的范围将比不使用的范围更长。

当手头的问题不能证明我对所有代码进行性能分析是合理的,或者我从要验证的探查器中获取了一些数据时,我就使用此类。基本上,它会汇总我们在特定块中花费的时间,并在程序结束时将其输出到调试流(可通过DbgView查看),包括执行代码的次数(以及平均花费的时间)。

#pragma once
#include <tchar.h>
#include <windows.h>
#include <sstream>
#include <boost/noncopyable.hpp>

namespace scope_timer {
    class time_collector : boost::noncopyable {
        __int64 total;
        LARGE_INTEGER start;
        size_t times;
        const TCHAR* name;

        double cpu_frequency()
        { // cache the CPU frequency, which doesn't change.
            static double ret = 0; // store as double so devision later on is floating point and not truncating
            if (ret == 0) {
                LARGE_INTEGER freq;
                QueryPerformanceFrequency(&freq);
                ret = static_cast<double>(freq.QuadPart);
            }
            return ret;
        }
        bool in_use;

    public:
        time_collector(const TCHAR* n)
            : times(0)
            , name(n)
            , total(0)
            , start(LARGE_INTEGER())
            , in_use(false)
        {
        }

        ~time_collector()
        {
            std::basic_ostringstream<TCHAR> msg;
            msg << _T("scope_timer> ") <<  name << _T(" called: ");

            double seconds = total / cpu_frequency();
            double average = seconds / times;

            msg << times << _T(" times total time: ") << seconds << _T(" seconds  ")
                << _T(" (avg ") << average <<_T(")\n");
            OutputDebugString(msg.str().c_str());
        }

        void add_time(__int64 ticks)
        {
            total += ticks;
            ++times;
            in_use = false;
        }

        bool aquire()
        {
            if (in_use)
                return false;
            in_use = true;
            return true;
        }
    };

    class one_time : boost::noncopyable {
        LARGE_INTEGER start;
        time_collector* collector;
    public:
        one_time(time_collector& tc)
        {
            if (tc.aquire()) {
                collector = &tc;
                QueryPerformanceCounter(&start);
            }
            else
                collector = 0;
        }

        ~one_time()
        {
            if (collector) {
                LARGE_INTEGER end;
                QueryPerformanceCounter(&end);
                collector->add_time(end.QuadPart - start.QuadPart);
            }
        }
    };
}

// Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number)
#define TIME_THIS_SCOPE(name) \
    static scope_timer::time_collector st_time_collector_##name(_T(#name)); \
    scope_timer::one_time st_one_time_##name(st_time_collector_##name)

回答

好吧,我有两个代码段。在伪代码中,它们看起来像(它是简化版本,实际上我在使用QueryPerformanceFrequency):

第一个片段:

Timer timer = new Timer
timer.Start

第二段:

timer.Stop
show elapsed time

一点热键功夫,我可以说这段代码从我的CPU中偷走了多少时间。

回答

代码分析器和优化文章中有很多有关C ++代码概要分析的信息,并且还具有指向程序/类的免费下载链接,该链接将为我们显示不同代码路径/方法的图形表示。

回答

我通过创建两个类来完成我的配置文件:cProfile和cProfileManager。

cProfileManager将保存cProfile产生的所有数据。

具有以下要求的cProfile

  • cProfile有一个构造函数来初始化当前时间。
  • cProfile具有一个解构函数,该函数将类存活的总时间发送给cProfileManager

要使用这些配置文件类,我首先创建cProfileManager的实例。然后,将要分析的代码块放在花括号内。在花括号内,我创建一个cProfile实例。当代码块结束时,cProfile会将完成代码块所花费的时间发送给cProfileManager

范例程式码
这是代码示例(简化):

class cProfile
{
    cProfile()
    {
        TimeStart = GetTime();
    };

    ~cProfile()
    {
        ProfileManager->AddProfile (GetTime() - TimeStart);
    }

    float TimeStart;
}

要使用cProfile,我会做这样的事情:

int main()
{
    printf("Start test");
    {
        cProfile Profile;
        Calculate();
    }
    ProfileManager->OutputData();
}

或者这个:

void foobar()
{
    cProfile ProfileFoobar;

    foo();
    {
        cProfile ProfileBarCheck;
        while (bar())
        {
            cProfile ProfileSpam;
            spam();
        }
    }
}

技术说明

这段代码实际上是对C ++中作用域,构造函数和反构造函数工作方式的滥用。 cProfile仅存在于块范围内(我们要测试的代码块)。一旦程序离开了块范围,cProfile记录结果。

其他增强功能

  • 我们可以将字符串参数添加到构造函数中,以便执行以下操作:cProfile Profile("复杂计算的配置文件");
  • 我们可以使用宏使代码看起来更简洁(请注意不要滥用它。与我们对语言的其他滥用不同,使用宏可能会很危险)。示例:#define START_PROFILE cProfile Profile(); {#define END_PROFILE}
  • cProfileManager可以检查代码块被调用了多少次。但是我们需要一个代码块标识符。第一个增强功能可以帮助识别块。如果要分析的代码在循环内(例如第二个示例aboe),这可能会很有用。我们还可以添加代码块花费的平均,最快和最长执行时间。
  • 如果我们处于调试模式,请不要忘记添加检查以跳过分析。

回答

我有一个快速且肮脏的分析类,即使在最紧密的内部循环中也可以用于分析。重点在于极轻的重量和简单的代码。该类分配一个固定大小的二维数组。然后,我在各处添加"检查点"调用。当在检查点M之后立即到达检查点N时,我将经过的时间(以微秒为单位)添加到数组项[M,N]。由于这是为了分析紧密循环而设计的,所以我也有"迭代开始"调用,它可以重置"最后一个检查点"变量。在测试结束时,dumpResults()调用将生成紧随其后的所有检查点对的列表,以及已计和未计的总时间。

回答

请注意,以下所有内容都是专门为Windows编写的。

我还编写了一个计时器类,以使用QueryPerformanceCounter()进行高精度的性能分析,以获取高精度时序,但略有不同。当Timer对象超出范围时,我的计时器类不会转储经过的时间。而是将经过的时间累积到一个集合中。我添加了一个静态成员函数Dump(),该函数创建一个经过时间表,按计时类别(在Timer的构造函数中指定为字符串)进行排序,并进行一些统计分析,例如平均经过时间,标准偏差,最大值和最小值。我还添加了一个Clear()静态成员函数,该函数清除集合并让我们重新开始。

如何使用Timer类(伪代码):

int CInsertBuffer::Read(char* pBuf)
{
       // TIMER NOTES: Avg Execution Time = ~1 ms
       Timer timer("BufferRead");
       :      :
       return -1;
}

输出示例:

Timer Precision = 418.0095 ps

=== Item               Trials    Ttl Time  Avg Time  Mean Time StdDev    ===
    AddTrade           500       7 ms      14 us     12 us     24 us
    BufferRead         511       1:19.25   0.16 s    621 ns    2.48 s
    BufferWrite        516       511 us    991 ns    482 ns    11 us
    ImportPos Loop     1002      18.62 s   19 ms     77 us     0.51 s
    ImportPosition     2         18.75 s   9.38 s    16.17 s   13.59 s
    Insert             515       4.26 s    8 ms      5 ms      27 ms
    recv               101       18.54 s   0.18 s    2603 ns   1.63 s

文件Timer.inl:

#include <map>
#include "x:\utils\stlext\stringext.h"
#include <iterator>
#include <set>
#include <vector>
#include <numeric>
#include "x:\utils\stlext\algorithmext.h"
#include <math.h>

    class Timer
    {
    public:
        Timer(const char* name)
        {
            label = std::safe_string(name);
            QueryPerformanceCounter(&startTime);
        }

        virtual ~Timer()
        {
            QueryPerformanceCounter(&stopTime);
            __int64 clocks = stopTime.QuadPart-startTime.QuadPart;
            double elapsed = (double)clocks/(double)TimerFreq();
            TimeMap().insert(std::make_pair(label,elapsed));
        };

        static std::string Dump(bool ClipboardAlso=true)
        {
            static const std::string loc = "Timer::Dump";

            if( TimeMap().empty() )
            {
                return "No trials\r\n";
            }

            std::string ret = std::formatstr("\r\n\r\nTimer Precision = %s\r\n\r\n", format_elapsed(1.0/(double)TimerFreq()).c_str());

            // get a list of keys
            typedef std::set<std::string> keyset;
            keyset keys;
            std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key());

            size_t maxrows = 0;

            typedef std::vector<std::string> strings;
            strings lines;

            static const size_t tabWidth = 9;

            std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2, "Item", tabWidth, tabWidth, "Trials", tabWidth, tabWidth, "Ttl Time", tabWidth, tabWidth, "Avg Time", tabWidth, tabWidth, "Mean Time", tabWidth, tabWidth, "StdDev");
            ret += std::formatstr("\r\n%s\r\n", head.c_str());
            if( ClipboardAlso ) 
                lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n");
            // dump the values for each key
            {for( keyset::iterator key = keys.begin(); keys.end() != key; ++key )
            {
                time_type ttl = 0;
                ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key));
                size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key));
                if( num > maxrows ) 
                    maxrows = num;
                time_type avg = ttl / num;

                // compute mean
                std::vector<time_type> sortedTimes;
                std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key));
                std::sort(sortedTimes.begin(), sortedTimes.end());
                size_t mid = (size_t)floor((double)num/2.0);
                double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid];
                // compute variance
                double sum = 0.0;
                if( num > 1 )
                {
                    for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt )
                        sum += pow(*timeIt-mean,2.0);
                }
                // compute std dev
                double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0;

                ret += std::formatstr("    %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s\r\n", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str()); 
                if( ClipboardAlso )
                    lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s\r\n", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str())); 

            }
            }
            ret += std::formatstr("%s\r\n", std::string(head.length(),'=').c_str());

            if( ClipboardAlso )
            {
                // dump header row of data block
                lines.push_back("");
                {
                    std::string s;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            s.append("\t");
                        s.append(*key);
                    }
                    s.append("\r\n");
                    lines.push_back(s);
                }

                // blow out the flat map of time values to a seperate vector of times for each key
                typedef std::map<std::string, std::vector<time_type> > nodematrix;
                nodematrix nodes;
                for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time )
                    nodes[time->first].push_back(time->second);

                // dump each data point
                for( size_t row = 0; row < maxrows; ++row )
                {
                    std::string rowDump;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            rowDump.append("\t");
                        if( nodes[*key].size() > row )
                            rowDump.append(std::formatstr("%f", nodes[*key][row]));
                    }
                    rowDump.append("\r\n");
                    lines.push_back(rowDump);
                }

                // dump to the clipboard
                std::string dump;
                for( strings::iterator s = lines.begin(); s != lines.end(); ++s )
                {
                    dump.append(*s);
                }

                OpenClipboard(0);
                EmptyClipboard();
                HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1);
                if( hg != 0 )
                {
                    char* buf = (char*)GlobalLock(hg);
                    if( buf != 0 )
                    {
                        std::copy(dump.begin(), dump.end(), buf);
                        buf[dump.length()] = 0;
                        GlobalUnlock(hg);
                        SetClipboardData(CF_TEXT, hg);
                    }
                }
                CloseClipboard();
            }

            return ret;
        }

        static void Reset()
        {
            TimeMap().clear();
        }

        static std::string format_elapsed(double d) 
        {
            if( d < 0.00000001 )
            {
                // show in ps with 4 digits
                return std::formatstr("%0.4f ps", d * 1000000000000.0);
            }
            if( d < 0.00001 )
            {
                // show in ns
                return std::formatstr("%0.0f ns", d * 1000000000.0);
            }
            if( d < 0.001 )
            {
                // show in us
                return std::formatstr("%0.0f us", d * 1000000.0);
            }
            if( d < 0.1 )
            {
                // show in ms
                return std::formatstr("%0.0f ms", d * 1000.0);
            }
            if( d <= 60.0 )
            {
                // show in seconds
                return std::formatstr("%0.2f s", d);
            }
            if( d < 3600.0 )
            {
                // show in min:sec
                return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
            }
            // show in h:min:sec
            return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
        }

    private:
        static __int64 TimerFreq()
        {
            static __int64 freq = 0;
            static bool init = false;
            if( !init )
            {
                LARGE_INTEGER li;
                QueryPerformanceFrequency(&li);
                freq = li.QuadPart;
                init = true;
            }
            return freq;
        }
        LARGE_INTEGER startTime, stopTime;
        std::string label;

        typedef std::string key_type;
        typedef double time_type;
        typedef std::multimap<key_type, time_type> Times;
//      static Times times;
        static Times& TimeMap()
        {
            static Times times_;
            return times_;
        }

        struct extract_key : public std::unary_function<Times::value_type, key_type>
        {
            std::string operator()(Times::value_type const & r) const
            {
                return r.first;
            }
        };

        struct extract_val : public std::unary_function<Times::value_type, time_type>
        {
            time_type operator()(Times::value_type const & r) const
            {
                return r.second;
            }
        };
        struct match_key : public std::unary_function<Times::value_type, bool>
        {
            match_key(key_type const & key_) : key(key_) {};
            bool operator()(Times::value_type const & rhs) const
            {
                return key == rhs.first;
            }
        private:
            match_key& operator=(match_key&) { return * this; }
            const key_type key;
        };

        struct accum_key : public std::binary_function<time_type, Times::value_type, time_type>
        {
            accum_key(key_type const & key_) : key(key_), n(0) {};
            time_type operator()(time_type const & v, Times::value_type const & rhs) const
            {
                if( key == rhs.first )
                {
                    ++n;
                    return rhs.second + v;
                }
                return v;
            }
        private:
            accum_key& operator=(accum_key&) { return * this; }
            const Times::key_type key;
            mutable size_t n;
        };
    };

文件stringext.h(提供formatstr()函数):

namespace std
{
    /*  ---

    Formatted Print

        template<class C>
        int strprintf(basic_string<C>* pString, const C* pFmt, ...);

        template<class C>
        int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args);

    Returns :

        # characters printed to output

    Effects :

        Writes formatted data to a string.  strprintf() works exactly the same as sprintf(); see your
        documentation for sprintf() for details of peration.  vstrprintf() also works the same as sprintf(), 
        but instead of accepting a variable paramater list it accepts a va_list argument.

    Requires :

        pString is a pointer to a basic_string<>

    --- */

    template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr);

    template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnwprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<class Type, class Traits>
    inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args)
    {
        // prologue
        static const size_t ChunkSize = 1024;
        size_t curBufSize = 0;
        outStr.erase(); 

        if( !format )
        {
            return 0;
        }

        // keep trying to write the string to an ever-increasing buffer until
        // either we get the string written or we run out of memory
        while( bool cont = true )
        {
            // allocate a local buffer
            curBufSize += ChunkSize;
            std::ref_ptr<Type> localBuffer = new Type[curBufSize];
            if( localBuffer.get() == 0 )
            {
                // we ran out of memory -- nice goin'!
                return -1;
            }
            // format output to local buffer
            int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args);
            if( -1 == i )
            {
                // the buffer wasn't big enough -- try again
                continue;
            }
            else if( i < 0 )
            {
                // something wierd happened -- bail
                return i;
            }
            // if we get to this point the string was written completely -- stop looping
            outStr.assign(localBuffer.get(),i);
            return i;
        }
        // unreachable code
        return -1;
    };

    // provided for backward-compatibility
    template<class Type, class Traits>
    inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args)
    {
        return vstringprintf(*outStr, format, args);
    }

    template<class Char, class Traits>
    inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(outString, format, args);
        va_end(args);
        return retval;
    }

    // old function provided for backward-compatibility
    template<class Char, class Traits>
    inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(*outString, format, args);
        va_end(args);
        return retval;
    }

    /*  ---

    Inline Formatted Print

        string strprintf(const char* Format, ...);

    Returns :

        Formatted string

    Effects :

        Writes formatted data to a string.  formatstr() works the same as sprintf(); see your
        documentation for sprintf() for details of operation.  

    --- */

    template<class Char>
    inline std::basic_string<Char> formatstr(const Char * format, ...)
    {
        std::string outString;

        va_list args;
        va_start(args, format);
        vstringprintf(outString, format, args);
        va_end(args);
        return outString;
    }
};

文件algorithmext.h(提供transform_if()函数):

/*  ---

Transform
25.2.3

    template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred)

    template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred)

Requires:   

    T is of type EqualityComparable (20.1.1) 
    op and binary_op have no side effects

Effects :

    Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of:
        1:  op( *(first1 + (i - result)) 
        2:  binary_op( *(first1 + (i - result), *(first2 + (i - result))

Returns :

    result + (last1 - first1)

Complexity :

    At most last1 - first1 applications of op or binary_op

--- */

template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate>
OutputIterator transform_if(InputIterator first, 
                            InputIterator last, 
                            OutputIterator result, 
                            UnaryFunction f, 
                            Predicate pred)
{
    for (; first != last; ++first)
    {
        if( pred(*first) )
            *result++ = f(*first);
    }
    return result; 
}

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
OutputIterator transform_if(InputIterator1 first1, 
                            InputIterator1 last1, 
                            InputIterator2 first2, 
                            OutputIterator result, 
                            BinaryOperation binary_op, 
                            Predicate pred)
{
    for (; first1 != last1 ; ++first1, ++first2)
    {
        if( pred(*first1) )
            *result++ = binary_op(*first1,*first2);
    }
    return result;
}