Python 使用 numpy 的一种热编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38592324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
One Hot Encoding using numpy
提问by Abhijay Ghildyal
If the input is zero I want to make an array which looks like this:
如果输入为零,我想创建一个如下所示的数组:
[1,0,0,0,0,0,0,0,0,0]
and if the input is 5:
如果输入是 5:
[0,0,0,0,0,1,0,0,0,0]
For the above I wrote:
对于上述我写道:
np.put(np.zeros(10),5,1)
but it did not work.
但它没有用。
Is there any way in which, this can be implemented in one line?
有什么方法可以在一行中实现吗?
回答by Martin Thoma
Usually, when you want to get a one-hot encoding for classification in machine learning, you have an array of indices.
通常,当您想在机器学习中获得用于分类的 one-hot 编码时,您有一个索引数组。
import numpy as np
nb_classes = 6
targets = np.array([[2, 3, 4, 0]]).reshape(-1)
one_hot_targets = np.eye(nb_classes)[targets]
The one_hot_targets
is now
现在one_hot_targets
是
array([[[ 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0.]]])
The .reshape(-1)
is there to make sure you have the right labels format (you might also have [[2], [3], [4], [0]]
). The -1
is a special value which means "put all remaining stuff in this dimension". As there is only one, it flattens the array.
该.reshape(-1)
是有,以确保您有正确的标签格式(你可能也有[[2], [3], [4], [0]]
)。本-1
是一个特殊值,它的意思是“把所有剩下的东西,在这方面”。由于只有一个,它使阵列变平。
Copy-Paste solution
复制粘贴解决方案
def get_one_hot(targets, nb_classes):
res = np.eye(nb_classes)[np.array(targets).reshape(-1)]
return res.reshape(list(targets.shape)+[nb_classes])
Package
包裹
You can use mpu.ml.indices2one_hot. It's tested and simple to use:
您可以使用mpu.ml.indices2one_hot。它经过测试且易于使用:
import mpu.ml
one_hot = mpu.ml.indices2one_hot([1, 3, 0], nb_classes=5)
回答by HolyDanna
Something like :
就像是 :
np.array([int(i == 5) for i in range(10)])
Should do the trick. But I suppose there exist other solutions using numpy.
应该做的伎俩。但我想还有其他使用 numpy 的解决方案。
edit : the reason why your formula does not work : np.put does not return anything, it just modifies the element given in first parameter. The good answer while using np.put()
is :
编辑:您的公式不起作用的原因:np.put 不返回任何内容,它只是修改了第一个参数中给出的元素。使用时的好答案np.put()
是:
a = np.zeros(10)
np.put(a,5,1)
The problem is that it can't be done in one line, as you need to define the array before passing it to np.put()
问题是它不能在一行中完成,因为您需要在将数组传递给之前定义它 np.put()
回答by Sung Kim
Use np.identity
or np.eye
. You can try something like this with your input i, and the array size s:
使用np.identity
或np.eye
。您可以使用输入 i 和数组大小 s 尝试类似的操作:
np.identity(s)[i:i+1]
For example, print(np.identity(5)[0:1])
will result:
例如,print(np.identity(5)[0:1])
将导致:
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
If you are using TensorFlow, you can use tf.one_hot
: https://www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot
如果您使用的是 TensorFlow,则可以使用tf.one_hot
:https: //www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot
回答by Rikku Porta
You could use List comprehension:
您可以使用列表理解:
[0 if i !=5 else 1 for i in range(10)]
turns to
转向
[0,0,0,0,0,1,0,0,0,0]
回答by m00am
The problem here is that you save your array nowhere. The put
function works in place on the array and returns nothing. Since you never give your array a name you can not address it later. So this
这里的问题是你无处保存你的数组。该put
函数在数组上工作,不返回任何内容。因为你从来没有给你的数组一个名字,所以以后不能解决它。所以这
one_pos = 5
x = np.zeros(10)
np.put(x, one_pos, 1)
would work, but then you could just use indexing:
会工作,但你可以只使用索引:
one_pos = 5
x = np.zeros(10)
x[one_pos] = 1
In my opinion that would be the correct way to do this if no special reason exists to do this as a one liner. This might also be easier to read and readable code is good code.
在我看来,如果没有特殊原因将其作为单班轮进行,这将是正确的方法。这也可能更容易阅读并且可读的代码是好的代码。
回答by Mad Physicist
Taking a quick look at the manual, you will see that np.put
does not return a value. While your technique is fine, you are accessing None
instead of your result array.
快速浏览一下手册,你会发现np.put
它没有返回值。虽然您的技术很好,但您正在访问None
而不是结果数组。
For a 1-D array it is better to just use direct indexing, especially for such a simple case.
对于一维数组,最好只使用直接索引,特别是对于这种简单的情况。
Here is how to rewrite your code with minimal modification:
以下是如何以最少的修改重写您的代码:
arr = np.zeros(10)
np.put(arr, 5, 1)
Here is how to do the second line with indexing instead of put
:
以下是如何使用索引而不是执行第二行put
:
arr[5] = 1
回答by PM 2Ring
The np.put
mutates its array arg in-place. It's conventional in Python for functions / methods that perform in-place mutation to return None
; np.put
adheres to that convention. So if a
is a 1D array and you do
该np.put
变异及其阵列ARG原地。在 Python 中,执行就地变异的函数/方法返回是常规的None
;np.put
遵守该公约。所以如果a
是一维数组,你做
a = np.put(a, 5, 1)
then a
will get replaced by None
.
然后a
将被替换None
。
Your code is similar to that, but it passes an un-named array to np.put
.
您的代码与此类似,但它将未命名的数组传递给np.put
.
A compact & efficient way to do what you want is with a simple function, eg:
做你想做的事情的一种紧凑而有效的方法是使用一个简单的函数,例如:
import numpy as np
def one_hot(i):
a = np.zeros(10, 'uint8')
a[i] = 1
return a
a = one_hot(5)
print(a)
output
输出
[0 0 0 0 0 1 0 0 0 0]
回答by Ken Chan
I'm not sure the performance, but the following code works and it's neat.
我不确定性能,但以下代码有效并且很整洁。
x = np.array([0, 5])
x_onehot = np.identity(6)[x]
回答by Abhijay Ghildyal
import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
a= np.repeat(0,10)
np.put(a,l,1)
z.append(a)
print("--- %s seconds ---" % (time.time() - start_time))
#--- 0.00174784660339 seconds ---
import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
z.append(np.array([int(i == l) for i in range(10)]))
print("--- %s seconds ---" % (time.time() - start_time))
#--- 0.000400066375732 seconds ---