如何在 Tensorflow 中仅使用 Python 制作自定义激活函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39921607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make a custom activation function with only Python in Tensorflow?
提问by patapouf_ai
Suppose you need to make an activation function which is not possible using only pre-defined tensorflow building-blocks, what can you do?
假设您需要创建一个仅使用预定义的 tensorflow 构建块无法实现的激活函数,您能做什么?
So in Tensorflow it is possible to make your own activation function. But it is quite complicated, you have to write it in C++ and recompile the whole of tensorflow [1][2].
所以在 Tensorflow 中可以制作自己的激活函数。但它相当复杂,你必须用 C++ 编写它并重新编译整个 tensorflow [1] [2]。
Is there a simpler way?
有没有更简单的方法?
回答by patapouf_ai
Yes There is!
就在这里!
Credit:It was hard to find the information and get it working but here is an example copying from the principles and code found hereand here.
信用:很难找到信息并使其正常工作,但这里是从此处和此处找到的原理和代码复制的示例。
Requirements:Before we start, there are two requirement for this to be able to succeed. First you need to be able to write your activation as a function on numpy arrays. Second you have to be able to write the derivative of that function either as a function in Tensorflow (easier) or in the worst case scenario as a function on numpy arrays.
要求:在我们开始之前,有两个要求才能成功。首先,您需要能够将激活编写为 numpy 数组上的函数。其次,您必须能够将该函数的导数编写为 Tensorflow 中的函数(更简单),或者在最坏的情况下编写为 numpy 数组上的函数。
Writing Activation function:
写入激活函数:
So let's take for example this function which we would want to use an activation function:
所以让我们以这个函数为例,我们想要使用激活函数:
def spiky(x):
r = x % 1
if r <= 0.5:
return r
else:
return 0
The first step is making it into a numpy function, this is easy:
第一步是将它变成一个 numpy 函数,这很容易:
import numpy as np
np_spiky = np.vectorize(spiky)
Now we should write its derivative.
现在我们应该写出它的导数。
Gradient of Activation:In our case it is easy, it is 1 if x mod 1 < 0.5 and 0 otherwise. So:
激活梯度:在我们的例子中很容易,如果 x mod 1 < 0.5 则为 1,否则为 0。所以:
def d_spiky(x):
r = x % 1
if r <= 0.5:
return 1
else:
return 0
np_d_spiky = np.vectorize(d_spiky)
Now for the hard part of making a TensorFlow function out of it.
现在是用它制作 TensorFlow 函数的困难部分。
Making a numpy fct to a tensorflow fct:We will start by making np_d_spiky into a tensorflow function. There is a function in tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name)[doc]which transforms any numpy function to a tensorflow function, so we can use it:
将 numpy fct 转换为 tensorflow fct:我们将首先将 np_d_spiky 转换为 tensorflow 函数。tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name)[doc] 中有一个函数可以将任何 numpy 函数转换为 tensorflow 函数,因此我们可以使用它:
import tensorflow as tf
from tensorflow.python.framework import ops
np_d_spiky_32 = lambda x: np_d_spiky(x).astype(np.float32)
def tf_d_spiky(x,name=None):
with tf.name_scope(name, "d_spiky", [x]) as name:
y = tf.py_func(np_d_spiky_32,
[x],
[tf.float32],
name=name,
stateful=False)
return y[0]
tf.py_funcacts on lists of tensors (and returns a list of tensors), that is why we have [x](and return y[0]). The statefuloption is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations. One thing to be careful of at this point is that numpy used float64but tensorflow uses float32so you need to convert your function to use float32before you can convert it to a tensorflow function otherwise tensorflow will complain. This is why we need to make np_d_spiky_32first.
tf.py_func作用于张量列表(并返回张量列表),这就是我们有[x](并返回y[0])的原因。该stateful方法是,告诉tensorflow函数总是给出相同的输入(状态= FALSE)在这种情况下tensorflow可以简单地tensorflow图,这是我们的情况下,将可能在大多数情况下,相同的情况下输出。此时需要注意的一件事是 numpy 使用float64但 tensorflow 使用,float32因此您需要float32先将函数转换为使用,然后才能将其转换为 tensorflow 函数,否则 tensorflow 会抱怨。这就是为什么我们需要首先制作np_d_spiky_32。
What about the Gradients?The problem with only doing the above is that even though we now have tf_d_spikywhich is the tensorflow version of np_d_spiky, we couldn't use it as an activation function if we wanted to because tensorflow doesn't know how to calculate the gradients of that function.
梯度呢?仅执行上述操作的问题在于,即使我们现在拥有tf_d_spiky的 tensorflow 版本np_d_spiky,如果我们愿意,也不能将其用作激活函数,因为 tensorflow 不知道如何计算该函数的梯度。
Hack to get Gradients:As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient[doc]and tf.Graph.gradient_override_map[doc]. Copying the code from harponewe can modify the tf.py_funcfunction to make it define the gradient at the same time:
Hack to get Gradients:正如上面提到的来源中所解释的,有一个使用tf.RegisterGradient[doc]和tf.Graph.gradient_override_map[doc]定义函数梯度的黑客。复制harpone 中的代码,我们可以修改tf.py_func函数以使其同时定义渐变:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
Now we are almost done, the only thing is that the grad function we need to pass to the above py_func function needs to take a special form. It needs to take in an operation, and the previous gradients before the operation and propagate the gradients backward after the operation.
现在我们差不多完成了,唯一的事情是我们需要传递给上面的py_func函数的grad函数需要采取特殊的形式。它需要接受一个操作,以及操作前的先前梯度,并在操作后向后传播梯度。
Gradient Function:So for our spiky activation function that is how we would do it:
梯度函数:对于我们的尖峰激活函数,我们将这样做:
def spikygrad(op, grad):
x = op.inputs[0]
n_gr = tf_d_spiky(x)
return grad * n_gr
The activation function has only one input, that is why x = op.inputs[0]. If the operation had many inputs, we would need to return a tuple, one gradient for each input. For example if the operation was a-bthe gradient with respect to ais +1and with respect to bis -1so we would have return +1*grad,-1*grad. Notice that we need to return tensorflow functions of the input, that is why need tf_d_spiky, np_d_spikywould not have worked because it cannot act on tensorflow tensors. Alternatively we could have written the derivative using tensorflow functions:
激活函数只有一个输入,这就是为什么x = op.inputs[0]。如果操作有很多输入,我们需要返回一个元组,每个输入一个梯度。例如,如果操作是a-b关于ais+1和关于bis的梯度,-1那么我们将有return +1*grad,-1*grad。请注意,我们需要返回输入的tensorflow功能,这就是为什么需要tf_d_spiky,np_d_spiky就不会工作,因为它不能tensorflow张量的作用。或者,我们可以使用 tensorflow 函数编写导数:
def spikygrad2(op, grad):
x = op.inputs[0]
r = tf.mod(x,1)
n_gr = tf.to_float(tf.less_equal(r, 0.5))
return grad * n_gr
Combining it all together:Now that we have all the pieces, we can combine them all together:
将它们组合在一起:现在我们有了所有的部分,我们可以将它们组合在一起:
np_spiky_32 = lambda x: np_spiky(x).astype(np.float32)
def tf_spiky(x, name=None):
with tf.name_scope(name, "spiky", [x]) as name:
y = py_func(np_spiky_32,
[x],
[tf.float32],
name=name,
grad=spikygrad) # <-- here's the call to the gradient
return y[0]
And now we are done. And we can test it.
现在我们完成了。我们可以测试它。
Test:
测试:
with tf.Session() as sess:
x = tf.constant([0.2,0.7,1.2,1.7])
y = tf_spiky(x)
tf.initialize_all_variables().run()
print(x.eval(), y.eval(), tf.gradients(y, [x])[0].eval())
[ 0.2 0.69999999 1.20000005 1.70000005] [ 0.2 0. 0.20000005 0.] [ 1. 0. 1. 0.]
[ 0.2 0.69999999 1.20000005 1.70000005] [ 0.2 0. 0.20000005 0.] [ 1. 0. 1. 0.]
Success!
成功!
回答by Mr Tsjolder
Why not simply use the functions that are already available in tensorflow to build your new function?
为什么不简单地使用 tensorflow 中已有的函数来构建您的新函数?
For the spikyfunction in your answer, this could look as follows
对于您的答案中的spiky功能,这可能如下所示
def spiky(x):
r = tf.floormod(x, tf.constant(1))
cond = tf.less_equal(r, tf.constant(0.5))
return tf.where(cond, r, tf.constant(0))
I would consider this substantially much easier (not even need to compute any gradients) and unless you want to do really exotic things, I can barely imagine that tensorflow does not provide the building blocks for building highly complex activation functions.
我认为这要容易得多(甚至不需要计算任何梯度),除非你想做真正奇特的事情,否则我几乎无法想象 tensorflow 不提供构建高度复杂的激活函数的构建块。


