Python 如何将one-hot编码转换为整数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42497340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:47:21  来源:igfitidea点击:

How to convert one-hot encodings into integers?

pythonnumpytensorflow

提问by Hyman

I have a numpy array data set with shape (100,10). Each row is a one-hot encoding. I want to transfer it into a nd-array with shape (100,) such that I transferred each vector row into a integer that denote the index of the nonzero index. Is there a quick way of doing this using numpy or tensorflow?

我有一个形状为 (100,10) 的 numpy 数组数据集。每一行都是一个单热编码。我想将它传输到一个形状为 (100,) 的 nd 数组中,这样我将每个向量行转换为一个整数,表示非零索引的索引。是否有使用 numpy 或 tensorflow 快速执行此操作的方法?

采纳答案by JawguyChooser

As pointed out by Franck Dernoncourt, since a one hot encoding only has a single 1 and the rest are zeros, you can use argmax for this particular example. In general, if you want to find a value in a numpy array, you'll probabaly want to consult numpy.where. Also, this stack exchange question:

正如 Franck Dernoncourt 所指出的,由于 one hot 编码只有一个 1,其余的都是 0,因此您可以在这个特定示例中使用 argmax。一般来说,如果你想在一个 numpy 数组中找到一个值,你可能会想咨询numpy.where。另外,这个堆栈交换问题:

Is there a NumPy function to return the first index of something in an array?

是否有一个 NumPy 函数来返回数组中某物的第一个索引?

Since a one-hot vector is a vector with all 0s and a single 1, you can do something like this:

由于 one-hot 向量是一个全为 0 且只有一个 1 的向量,因此您可以执行以下操作:

>>> import numpy as np
>>> a = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
>>> [np.where(r==1)[0][0] for r in a]
[1, 0, 3]

This just builds a list of the index which is 1 for each row. The [0][0] indexing is just to ditch the structure (a tuple with an array) returned by np.wherewhich is more than you asked for.

这只是构建一个索引列表,每行都为 1。[0][0] 索引只是为了放弃返回的结构(带有数组的元组),np.where它比您要求的要多。

For any particular row, you just want to index into a. For example in the zeroth row the 1 is found in index 1.

对于任何特定行,您只想索引到 a. 例如,在第 0 行,在索引 1 中找到 1。

>>> np.where(a[0]==1)[0][0]
1

回答by Franck Dernoncourt

You can use numpy.argmaxor tf.argmax. Example:

您可以使用 numpy.argmaxtf.argmax。例子:

import numpy as np  
a  = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
print('np.argmax(a, axis=1): {0}'.format(np.argmax(a, axis=1)))

output:

输出:

np.argmax(a, axis=1): [1 0 3]

You may also want to look at sklearn.preprocessing.LabelBinarizer.inverse_transform.

您可能还想查看 sklearn.preprocessing.LabelBinarizer.inverse_transform.

回答by user9114146

Simply use np.argmax(x, axis=1)

只需使用 np.argmax(x, axis=1)

Example:

例子:

import numpy as np
array = np.array([[0, 1, 0, 0], [0, 0, 0, 1]])
print(np.argmax(array, axis=1))
> [1 3]

回答by Martin Thoma

While I strongly suggest to use numpy for speed, mpu.ml.one_hot2indices(one_hots)shows how to do it without numpy. Simply pip install mpu --user --upgrade.

虽然我强烈建议使用 numpy 来提高速度,但mpu.ml.one_hot2indices(one_hots)展示了如何在没有 numpy 的情况下做到这一点。简直了pip install mpu --user --upgrade

Then you can do

然后你可以做

>>> one_hot2indices([[1, 0], [1, 0], [0, 1]])
[0, 0, 1]

回答by Iván Sánchez

def int_to_onehot(n, n_classes):
    v = [0] * n_classes
    v[n] = 1
    return v

def onehot_to_int(v):
    return v.index(1)


>>> v = int_to_onehot(2, 5)
>>> v
[0, 0, 1, 0, 0]


>>> i = onehot_to_int(v)
>>> i
2

回答by Emre Tatbak

You can use this simple code:

您可以使用这个简单的代码:

a=[[0,0,0,0,0,1,0,0,0,0]]
j=0
for i in a[0]:
    if i==1:
        print(j)
    else:
        j+=1

5

5

回答by Pando MM

What I do in these cases is something like this. The idea is to interpret the one-hot vector as an index of a 1,2,3,4,5... array.

在这些情况下,我所做的就是这样。这个想法是将 one-hot 向量解释为 1,2,3,4,5... 数组的索引。

# Define stuff
import numpy as np
one_hots = np.zeros([100,10])
for k in range(100):
    one_hots[k,:] = np.random.permutation([1,0,0,0,0,0,0,0,0,0])

# Finally, the trick
ramp = np.tile(np.arange(0,10),[100,1])
integers = ramp[one_hots==1].ravel()

I prefer this trick because I feel np.argmaxand other suggested solutions may be slower than indexing (although indexing may consume more memory)

我更喜欢这个技巧,因为我觉得np.argmax其他建议的解决方案可能比索引慢(尽管索引可能会消耗更多内存)