C#中的数学优化

Question

提问by hb.

I've been profiling an application all day long and, having optimized a couple bits of code, I'm left with this on my todo list. It's the activation function for a neural network, which gets called over a 100 million times. According to dotTrace, it amounts to about 60% of the overall function time.

我整天都在分析一个应用程序，优化了一些代码后，我把这个留在了我的待办事项列表中。它是神经网络的激活函数，被调用超过 1 亿次。据 dotTrace 称，它占整个函数时间的 60% 左右。

How would you optimize this?

你会如何优化这个？

public static float Sigmoid(double value) {
    return (float) (1.0 / (1.0 + Math.Pow(Math.E, -value)));
}

Answer 1

采纳答案by Sophie Alpert

Try:

尝试：

public static float Sigmoid(double value) {
    return 1.0f / (1.0f + (float) Math.Exp(-value));
}

EDIT:I did a quick benchmark. On my machine, the above code is about 43% faster than your method, and this mathematically-equivalent code is the teeniest bit faster (46% faster than the original):

编辑：我做了一个快速的基准测试。在我的机器上，上面的代码比你的方法快 43%，这个数学上等效的代码是最慢的（比原始代码快 46%）：

public static float Sigmoid(double value) {
    float k = Math.Exp(value);
    return k / (1.0f + k);
}

EDIT 2:I'm not sure how much overhead C# functions have, but if you #include <math.h>in your source code, you should be able to use this, which uses a float-exp function. It might be a little faster.

编辑 2：我不确定 C# 函数有多少开销，但如果你#include <math.h>在你的源代码中，你应该能够使用它，它使用一个 float-exp 函数。可能会快一点。

public static float Sigmoid(double value) {
    float k = expf((float) value);
    return k / (1.0f + k);
}

Also if you're doing millions of calls, the function-calling overhead might be a problem. Try making an inline function and see if that's any help.

此外，如果您要进行数百万次调用，则函数调用开销可能是一个问题。尝试创建一个内联函数，看看是否有帮助。

Answer 2

回答by Vilx-

Idea: Perhaps you can make a (large) lookup table with the values pre-calculated?

想法：也许您可以使用预先计算的值制作（大）查找表？

Answer 3

回答by Shog9

At 100 million calls, i'd start to wonder if profiler overhead isn't skewing your results. Replace the calculation with a no-op and see if it is stillreported to consume 60% of the execution time...

在 1 亿次调用中，我开始怀疑分析器开销是否不会影响您的结果。用no-op替换计算，看看是否仍然报告消耗了60％的执行时间......

Or better yet, create some test data and use a stopwatch timer to profile a million or so calls.

或者更好的是，创建一些测试数据并使用秒表计时器来分析一百万左右的呼叫。

Answer 4

回答by Haacked

Doing a Google search, I found an alternative implementation of the Sigmoid function.

通过 Google 搜索，我找到了 Sigmoid 函数的替代实现。

public double Sigmoid(double x)
{
   return 2 / (1 + Math.Exp(-2 * x)) - 1;
}

Is that correct for your needs? Is it faster?

这对您的需求是否正确？它更快吗？

http://dynamicnotions.blogspot.com/2008/09/sigmoid-function-in-c.html

Answer 5

回答by Neil Coffey

If it's for an activation function, does it matter terribly much if the calculation of e^x is completely accurate?

如果是用于激活函数，那么如果 e^x 的计算完全准确，那么重要吗？

For example, if you use the approximation (1+x/256)^256, on my Pentium testing in Java (I'm assuming C# essentially compiles to the same processor instructions) this is about 7-8 times faster than e^x (Math.exp()), and is accurate to 2 decimal places up to about x of +/-1.5, and within the correct order of magnitude across the range you stated. (Obviously, to raise to the 256, you actually square the number 8 times -- don't use Math.Pow for this!) In Java:

例如，如果您使用近似值 (1+x/256)^256，在我用 Java 进行的 Pentium 测试中（我假设 C# 基本上编译为相同的处理器指令），这大约比 e^x 快 7-8 倍(Math.exp())，精确到小数点后 2 位，最多约为 +/-1.5 的 x，并且在您所述范围内的正确数量级内。（显然，要提高到 256，您实际上要对数字进行 8 次平方——不要为此使用 Math.Pow！）在 Java 中：

double eapprox = (1d + x / 256d);
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;
eapprox *= eapprox;

Keep doubling or halving 256 (and adding/removing a multiplication) depending on how accurate you want the approximation to be. Even with n=4, it still gives about 1.5 decimal places of accuracy for values of x beween -0.5 and 0.5 (and appears a good 15 times faster than Math.exp()).

根据您希望近似值的准确程度，将 256 加倍或减半（并添加/删除乘法）。即使 n=4，它仍然为 -0.5 和 0.5 之间的 x 值提供大约 1.5 个小数位的精度（并且看起来比 Math.exp() 快 15 倍）。

P.S. I forgot to mention -- you should obviously not reallydivide by 256: multiply by a constant 1/256. Java's JIT compiler makes this optimisation automatically (at least, Hotspot does), and I was assuming that C# must do too.

PS 我忘了提到——你显然不应该真正除以 256：乘以常数 1/256。Java 的 JIT 编译器自动进行这种优化（至少 Hotspot 是这样），我假设 C# 也必须这样做。

Answer 6

回答by Stobor

First thought: How about some stats on the values variable?

第一个想法：values 变量的一些统计数据怎么样？

Are the values of "value" typically small -10 <= value <= 10?

“值”的值是否通常很小 -10 <= value <= 10？

If not, you can probably get a boost by testing for out of bounds values

如果没有，您可能可以通过测试越界值来获得提升

if(value < -10)  return 0;
if(value > 10)  return 1;

Are the values repeated often?

这些值是否经常重复？

If so, you can probably get some benefit from Memoization(probably not, but it doesn't hurt to check....)

如果是这样，您可能会从Memoization 中获得一些好处（可能不会，但检查一下也无妨....）

if(sigmoidCache.containsKey(value)) return sigmoidCache.get(value);

If neither of these can be applied, then as some others have suggested, maybe you can get away with lowering the accuracy of your sigmoid...

如果这些都不能应用，那么正如其他人所建议的那样，也许您可以通过降低 sigmoid 的准确性来逃脱...

Answer 7

回答by Henrik Gustafsson

(Updated with performance measurements)(Updated again with real results :)

（更新了性能测量）（再次更新了真实结果:)

I think a lookup table solution would get you very far when it comes to performance, at a negligible memory and precision cost.

我认为查找表解决方案可以让您在性能方面走得更远，而内存和精度成本可以忽略不计。

The following snippet is an example implementation in C (I don't speak c# fluently enough to dry-code it). It runs and performs well enough, but I'm sure there's a bug in it :)

下面的代码片段是 C 中的一个示例实现（我的 C# 说得不够流利，无法对其进行干编码）。它运行和性能足够好，但我确定它有一个错误:)

#include <math.h>
#include <stdio.h>
#include <time.h>

#define SCALE 320.0f
#define RESOLUTION 2047
#define MIN -RESOLUTION / SCALE
#define MAX RESOLUTION / SCALE

static float sigmoid_lut[RESOLUTION + 1];

void init_sigmoid_lut(void) {
    int i;    
    for (i = 0; i < RESOLUTION + 1; i++) {
        sigmoid_lut[i] =  (1.0 / (1.0 + exp(-i / SCALE)));
    }
}

static float sigmoid1(const float value) {
    return (1.0f / (1.0f + expf(-value)));
}

static float sigmoid2(const float value) {
    if (value <= MIN) return 0.0f;
    if (value >= MAX) return 1.0f;
    if (value >= 0) return sigmoid_lut[(int)(value * SCALE + 0.5f)];
    return 1.0f-sigmoid_lut[(int)(-value * SCALE + 0.5f)];
}

float test_error() {
    float x;
    float emax = 0.0;

    for (x = -10.0f; x < 10.0f; x+=0.00001f) {
        float v0 = sigmoid1(x);
        float v1 = sigmoid2(x);
        float error = fabsf(v1 - v0);
        if (error > emax) { emax = error; }
    } 
    return emax;
}

int sigmoid1_perf() {
    clock_t t0, t1;
    int i;
    float x, y = 0.0f;

    t0 = clock();
    for (i = 0; i < 10; i++) {
        for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
            y = sigmoid1(x);
        }
    }
    t1 = clock();
    printf("", y); /* To avoid sigmoidX() calls being optimized away */
    return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}

int sigmoid2_perf() {
    clock_t t0, t1;
    int i;
    float x, y = 0.0f;
    t0 = clock();
    for (i = 0; i < 10; i++) {
        for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
            y = sigmoid2(x);
        }
    }
    t1 = clock();
    printf("", y); /* To avoid sigmoidX() calls being optimized away */
    return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}

int main(void) {
    init_sigmoid_lut();
    printf("Max deviation is %0.6f\n", test_error());
    printf("10^7 iterations using sigmoid1: %d ms\n", sigmoid1_perf());
    printf("10^7 iterations using sigmoid2: %d ms\n", sigmoid2_perf());

    return 0;
}

Previous results were due to the optimizer doing its job and optimized away the calculations. Making it actually execute the code yields slightly different and much more interesting results (on my way slow MB Air):

以前的结果是由于优化器完成了它的工作并优化了计算。让它实际执行代码会产生稍微不同但更有趣的结果（在我的路上慢 MB Air）：

$ gcc -O2 test.c -o test && ./test
Max deviation is 0.001664
10^7 iterations using sigmoid1: 571 ms
10^7 iterations using sigmoid2: 113 ms

profile

TODO:

去做：

There are things to improve and ways to remove weaknesses; how to do is is left as an exercise to the reader :)

有改进的地方和消除弱点的方法；如何做是留给读者的练习:)

Tune the range of the function to avoid the jump where the table starts and ends.
Add a slight noise function to hide the aliasing artifacts.
As Rex said, interpolation could get you quite a bit further precision-wise while being rather cheap performance-wise.

调整函数的范围以避免表格开始和结束的跳转。
添加轻微的噪声功能以隐藏混叠伪影。
正如 Rex 所说，插值可以让你在精度方面更进一步，同时在性能方面相当便宜。

Answer 8

回答by Jeremy

Soprano had some nice optimizations your call:

Soprano 有一些不错的优化你的电话：

public static float Sigmoid(double value) 
{
    float k = Math.Exp(value);
    return k / (1.0f + k);
}

If you try a lookup table and find it uses too much memory you could always looking at the value of your parameter for each successive calls and employing some caching technique.

如果您尝试查找表并发现它使用了太多内存，您可以随时查看每个连续调用的参数值并使用一些缓存技术。

For example try caching the last value and result. If the next call has the same value as the previous one, you don't need to calculate it as you'd have cached the last result. If the current call was the same as the previous call even 1 out of a 100 times, you could potentially save yourself 1 million calculations.

例如尝试缓存最后一个值和结果。如果下一个调用的值与前一个调用的值相同，则不需要像缓存最后一个结果那样计算它。如果当前调用与前一个调用相同，即使是 100 次中的 1 次，您也有可能节省 100 万次计算。

Or, you may find that within 10 successive calls, the value parameter is on average the same 2 times, so you could then try caching the last 10 values/answers.

或者，您可能会发现在 10 次连续调用中，value 参数平均有 2 次相同，因此您可以尝试缓存最后 10 个值/答案。

Answer 9

回答by Jeremy

1) Do you call this from only one place? If so, you may gain a small amount of performance by moving the code out of that function and just putting it right where you would normally have called the Sigmoid function. I don't like this idea in terms of code readability and organization but when you need to get every last performance gain, this might help because I think function calls require a push/pop of registers on the stack, which could be avoided if your code was all inline.

1）你只从一个地方调用它吗？如果是这样，您可以通过将代码移出该函数并将其放在通常调用 Sigmoid 函数的位置来获得少量性能。在代码可读性和组织方面，我不喜欢这个想法，但是当您需要获得每一个最后的性能提升时，这可能会有所帮助，因为我认为函数调用需要在堆栈上推送/弹出寄存器，如果您代码都是内联的。

2) I have no idea if this might help but try making your function parameter a ref parameter. See if it's faster. I would have suggested making it const (which would have been an optimization if this were in c++) but c# doesn't support const parameters.

2）我不知道这是否有帮助，但请尝试将您的函数参数设为 ref 参数。看看是不是更快。我会建议将其设置为 const（如果在 c++ 中，这将是一种优化），但 c# 不支持 const 参数。

Answer 10

回答by joel.neely

You might also consider experimenting with alternative activation functions which are cheaper to evaluate. For example:

您还可以考虑尝试使用评估成本更低的替代激活函数。例如：

f(x) = (3x - x**3)/2

(which could be factored as

（这可以被分解为

f(x) = x*(3 - x*x)/2

for one less multiplication). This function has odd symmetry, and its derivative is trivial. Using it for a neural network requires normalizing the sum-of-inputs by dividing by the total number of inputs (limiting the domain to [-1..1], which is also range).

少一个乘法）。该函数具有奇对称性，其导数是微不足道的。将其用于神经网络需要通过除以输入总数来归一化输入总和（将域限制为 [-1..1]，这也是范围）。

C#中的数学优化

提问by hb.

采纳答案by Sophie Alpert

回答by Vilx-

回答by Shog9

回答by Haacked

回答by Neil Coffey

回答by Stobor

回答by Henrik Gustafsson

回答by Jeremy

回答by Jeremy

回答by joel.neely

相关推荐

最近更新

标签

C#中的数学优化

提问by hb.

采纳答案by Sophie Alpert

回答by Vilx-

回答by Shog9

回答by Haacked

回答by Neil Coffey

回答by Stobor

回答by Henrik Gustafsson

回答by Jeremy

回答by Jeremy

回答by joel.neely

相关推荐

C# 使用 Linq 确定一个序列是否包含另一个序列的所有元素

C# 何时使用浮动

C# 为什么我的班级中不能有“public static const string S =“stuff”;？

C# 如何获取ListBox中项目的索引？

相关推荐

最近更新

标签