Python 使用 numpy 的一种热编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38592324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:09:29  来源:igfitidea点击:

One Hot Encoding using numpy

pythonnumpyone-hot-encoding

提问by Abhijay Ghildyal

If the input is zero I want to make an array which looks like this:

如果输入为零,我想创建一个如下所示的数组:

[1,0,0,0,0,0,0,0,0,0]

and if the input is 5:

如果输入是 5:

[0,0,0,0,0,1,0,0,0,0]

For the above I wrote:

对于上述我写道:

np.put(np.zeros(10),5,1)

but it did not work.

但它没有用。

Is there any way in which, this can be implemented in one line?

有什么方法可以在一行中实现吗?

回答by Martin Thoma

Usually, when you want to get a one-hot encoding for classification in machine learning, you have an array of indices.

通常,当您想在机器学习中获得用于分类的 one-hot 编码时,您有一个索引数组。

import numpy as np
nb_classes = 6
targets = np.array([[2, 3, 4, 0]]).reshape(-1)
one_hot_targets = np.eye(nb_classes)[targets]

The one_hot_targetsis now

现在one_hot_targets

array([[[ 0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.]]])

The .reshape(-1)is there to make sure you have the right labels format (you might also have [[2], [3], [4], [0]]). The -1is a special value which means "put all remaining stuff in this dimension". As there is only one, it flattens the array.

.reshape(-1)是有,以确保您有正确的标签格式(你可能也有[[2], [3], [4], [0]])。本-1是一个特殊值,它的意思是“把所有剩下的东西,在这方面”。由于只有一个,它使阵列变平。

Copy-Paste solution

复制粘贴解决方案

def get_one_hot(targets, nb_classes):
    res = np.eye(nb_classes)[np.array(targets).reshape(-1)]
    return res.reshape(list(targets.shape)+[nb_classes])

Package

包裹

You can use mpu.ml.indices2one_hot. It's tested and simple to use:

您可以使用mpu.ml.indices2one_hot。它经过测试且易于使用:

import mpu.ml
one_hot = mpu.ml.indices2one_hot([1, 3, 0], nb_classes=5)

回答by HolyDanna

Something like :

就像是 :

np.array([int(i == 5) for i in range(10)])

Should do the trick. But I suppose there exist other solutions using numpy.

应该做的伎俩。但我想还有其他使用 numpy 的解决方案。

edit : the reason why your formula does not work : np.put does not return anything, it just modifies the element given in first parameter. The good answer while using np.put()is :

编辑:您的公式不起作用的原因:np.put 不返回任何内容,它只是修改了第一个参数中给出的元素。使用时的好答案np.put()是:

a = np.zeros(10)
np.put(a,5,1)

The problem is that it can't be done in one line, as you need to define the array before passing it to np.put()

问题是它不能在一行中完成,因为您需要在将数组传递给之前定义它 np.put()

回答by Sung Kim

Use np.identityor np.eye. You can try something like this with your input i, and the array size s:

使用np.identitynp.eye。您可以使用输入 i 和数组大小 s 尝试类似的操作:

np.identity(s)[i:i+1]

For example, print(np.identity(5)[0:1])will result:

例如,print(np.identity(5)[0:1])将导致:

[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

If you are using TensorFlow, you can use tf.one_hot: https://www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot

如果您使用的是 TensorFlow,则可以使用tf.one_hothttps: //www.tensorflow.org/api_docs/python/array_ops/slicing_and_joining#one_hot

回答by Rikku Porta

You could use List comprehension:

您可以使用列表理解:

[0 if i !=5 else 1 for i in range(10)]

turns to

转向

[0,0,0,0,0,1,0,0,0,0]

回答by m00am

The problem here is that you save your array nowhere. The putfunction works in place on the array and returns nothing. Since you never give your array a name you can not address it later. So this

这里的问题是你无处保存你的数组。该put函数在数组上工作,不返回任何内容。因为你从来没有给你的数组一个名字,所以以后不能解决它。所以这

one_pos = 5
x = np.zeros(10)
np.put(x, one_pos, 1)

would work, but then you could just use indexing:

会工作,但你可以只使用索引:

one_pos = 5
x = np.zeros(10)
x[one_pos] = 1

In my opinion that would be the correct way to do this if no special reason exists to do this as a one liner. This might also be easier to read and readable code is good code.

在我看来,如果没有特殊原因将其作为单班轮进行,这将是正确的方法。这也可能更容易阅读并且可读的代码是好的代码。

回答by Mad Physicist

Taking a quick look at the manual, you will see that np.putdoes not return a value. While your technique is fine, you are accessing Noneinstead of your result array.

快速浏览一下手册,你会发现np.put它没有返回值。虽然您的技术很好,但您正在访问None而不是结果数组。

For a 1-D array it is better to just use direct indexing, especially for such a simple case.

对于一维数组,最好只使用直接索引,特别是对于这种简单的情况。

Here is how to rewrite your code with minimal modification:

以下是如何以最少的修改重写您的代码:

arr = np.zeros(10)
np.put(arr, 5, 1)

Here is how to do the second line with indexing instead of put:

以下是如何使用索引而不是执行第二行put

arr[5] = 1

回答by PM 2Ring

The np.putmutates its array arg in-place. It's conventional in Python for functions / methods that perform in-place mutation to return None; np.putadheres to that convention. So if ais a 1D array and you do

np.put变异及其阵列ARG原地。在 Python 中,执行就地变异的函数/方法返回是常规的Nonenp.put遵守该公约。所以如果a是一维数组,你做

a = np.put(a, 5, 1)

then awill get replaced by None.

然后a将被替换None

Your code is similar to that, but it passes an un-named array to np.put.

您的代码与此类似,但它将未命名的数组传递给np.put.

A compact & efficient way to do what you want is with a simple function, eg:

做你想做的事情的一种紧凑而有效的方法是使用一个简单的函数,例如:

import numpy as np

def one_hot(i):
    a = np.zeros(10, 'uint8')
    a[i] = 1
    return a

a = one_hot(5) 
print(a)

output

输出

[0 0 0 0 0 1 0 0 0 0]

回答by Ken Chan

I'm not sure the performance, but the following code works and it's neat.

我不确定性能,但以下代码有效并且很整洁。

x = np.array([0, 5])
x_onehot = np.identity(6)[x]

回答by Abhijay Ghildyal

import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
    a= np.repeat(0,10)
    np.put(a,l,1)
    z.append(a)
print("--- %s seconds ---" % (time.time() - start_time))

#--- 0.00174784660339 seconds ---

import time
start_time = time.time()
z=[]
for l in [1,2,3,4,5,6,1,2,3,4,4,6,]:
    z.append(np.array([int(i == l) for i in range(10)]))
print("--- %s seconds ---" % (time.time() - start_time))

#--- 0.000400066375732 seconds ---