Python Tensorflow One 热编码器？

Question

提问by Robert Graves

Does tensorflow have something similar to scikit learn's one hot encoderfor processing categorical data? Would using a placeholder of tf.string behave as categorical data?

tensorflow 是否有类似于 scikit learn 的一种用于处理分类数据的热编码器？使用 tf.string 的占位符会作为分类数据吗？

I realize I can manually pre-process the data before sending it to tensorflow, but having it built in is very convenient.

我意识到我可以在将数据发送到 tensorflow 之前手动预处理数据，但是内置它非常方便。

Answer 1

采纳答案by dga

As of TensorFlow 0.8, there is now a native one-hot op, tf.one_hotthat can convert a set of sparse labels to a dense one-hot representation. This is in addition to tf.nn.sparse_softmax_cross_entropy_with_logits, which can in some cases let you compute the cross entropy directly on the sparse labels instead of converting them to one-hot.

从 TensorFlow 0.8 开始，现在有一个原生的 one-hot op，tf.one_hot可以将一组稀疏标签转换为密集的 one-hot 表示。这是对的补充tf.nn.sparse_softmax_cross_entropy_with_logits，在某些情况下，它可以让您直接在稀疏标签上计算交叉熵，而不是将它们转换为 one-hot。

Previous answer, in case you want to do it the old way:@Salvador's answer is correct - there (used to be) no native op to do it. Instead of doing it in numpy, though, you can do it natively in tensorflow using the sparse-to-dense operators:

上一个答案，如果您想以旧方式进行操作：@Salvador 的答案是正确的-（曾经）没有本地操作可以执行此操作。但是，您可以使用 sparse-to-dense 运算符在 tensorflow 中本地进行，而不是在 numpy 中进行：

num_labels = 10

# label_batch is a tensor of numeric labels to process
# 0 <= label < num_labels

sparse_labels = tf.reshape(label_batch, [-1, 1])
derived_size = tf.shape(label_batch)[0]
indices = tf.reshape(tf.range(0, derived_size, 1), [-1, 1])
concated = tf.concat(1, [indices, sparse_labels])
outshape = tf.pack([derived_size, num_labels])
labels = tf.sparse_to_dense(concated, outshape, 1.0, 0.0)

The output, labels, is a one-hot matrix of batch_size x num_labels.

输出标签是一个由 batch_size x num_labels 组成的单热矩阵。

Note also that as of 2016-02-12 (which I assume will eventually be part of a 0.7 release), TensorFlow also has the tf.nn.sparse_softmax_cross_entropy_with_logitsop, which in some cases can let you do training without needing to convert to a one-hot encoding.

另请注意，截至 2016 年 2 月 12 日（我认为最终将成为 0.7 版本的一部分），TensorFlow 也有tf.nn.sparse_softmax_cross_entropy_with_logits操作，在某些情况下可以让您进行训练而无需转换为单热编码。

Edited to add: At the end, you may need to explicitly set the shape of labels. The shape inference doesn't recognize the size of the num_labels component. If you don't need a dynamic batch size with derived_size, this can be simplified.

编辑添加：最后，您可能需要明确设置标签的形状。形状推断无法识别 num_labels 组件的大小。如果您不需要带有derived_size 的动态批量大小，则可以简化此操作。

Edited 2016-02-12 to change the assignment of outshape per comment below.

编辑 2016-02-12 以更改下面每个评论的外形分配。

Answer 2

回答by Salvador Dali

tf.one_hot()is available in TF and easy to use.

tf.one_hot()可在 TF 中使用且易于使用。

Lets assume you have 4 possible categories (cat, dog, bird, human) and 2 instances (cat, human). So your depth=4and your indices=[0, 3]

假设您有 4 个可能的类别（猫、狗、鸟、人）和 2 个实例（猫、人）。所以你的depth=4和你的indices=[0, 3]

import tensorflow as tf
res = tf.one_hot(indices=[0, 3], depth=4)
with tf.Session() as sess:
    print sess.run(res)

Keep in mind that if you provide index=-1 you will get all zeros in your one-hot vector.

请记住，如果您提供 index=-1，您的 one-hot 向量将全部为零。

Old answer, when this function was not available.

旧答案，当此功能不可用时。

After looking though the python documentation, I have not found anything similar. One thing that strengthen my belief that it does not exist is that in their own examplethey write one_hotmanually.

在查看了python 文档之后，我没有发现任何类似的东西。让我相信它不存在的一件事是，在他们自己的示例中，他们one_hot手动编写。

def dense_to_one_hot(labels_dense, num_classes=10):
  """Convert class labels from scalars to one-hot vectors."""
  num_labels = labels_dense.shape[0]
  index_offset = numpy.arange(num_labels) * num_classes
  labels_one_hot = numpy.zeros((num_labels, num_classes))
  labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
  return labels_one_hot

You can also do this in scikitlearn.

您也可以在scikitlearn 中执行此操作。

Answer 3

回答by Markus

Take a look at tf.nn.embedding_lookup. It maps from categorical IDs to their embeddings.

看看tf.nn.embedding_lookup。它从分类 ID 映射到它们的嵌入。

For an example of how it's used for input data, see here.

有关如何将其用于输入数据的示例，请参见此处。

Answer 4

回答by CFB

Maybe it's due to changes to Tensorflow since Nov 2015, but @dga's answer produced errors. I did get it to work with the following modifications:

也许是由于自 2015 年 11 月以来 Tensorflow 发生了变化，但 @dga 的回答产生了错误。我确实通过以下修改让它工作：

sparse_labels = tf.reshape(label_batch, [-1, 1])
derived_size = tf.shape(sparse_labels)[0]
indices = tf.reshape(tf.range(0, derived_size, 1), [-1, 1])
concated = tf.concat(1, [indices, sparse_labels])
outshape = tf.concat(0, [tf.reshape(derived_size, [1]), tf.reshape(num_labels, [1])])
labels = tf.sparse_to_dense(concated, outshape, 1.0, 0.0)

Answer 5

回答by Josh11b

You can use tf.sparse_to_dense:

您可以使用tf.sparse_to_dense：

The sparse_indices argument indicates where the ones should go, output_shape should be set to the number of possible outputs (e.g. the number of labels), and sparse_values should be 1 with the desired type (it will determine the type of the output from the type of sparse_values).

sparse_indices 参数指示应该去哪里， output_shape 应该设置为可能的输出数量（例如标签的数量），并且 sparse_values 应该是 1 和所需的类型（它将根据类型确定输出的类型sparse_values）。

Answer 6

回答by Yuan Tang

There's embedding_opsin Scikit Flow and examples that deal with categorical variables, etc.

Scikit Flow 中有embedding_ops和处理分类变量等的示例。

If you just begin to learn TensorFlow, I would suggest you trying out examplesin TensorFlow/skflowfirst and then once you are more familiar with TensorFlow it would be fairly easy for you to insert TensorFlow code to build a custom model you want (there are also examples for this).

如果您刚开始学习 TensorFlow，我建议您先在TensorFlow/skflow 中尝试示例，然后一旦您对 TensorFlow 更加熟悉，您就可以很容易地插入 TensorFlow 代码来构建您想要的自定义模型（有还有这方面的例子）。

Hope those examples for images and text understanding could get you started and let us know if you encounter any issues! (post issues or tag skflow in SO).

希望这些图像和文本理解示例可以帮助您入门，如果您遇到任何问题，请告诉我们！（在 SO 中发布问题或标记 skflow）。

Answer 7

回答by Eugene Brevdo

Recent versions of TensorFlow (nightlies and maybe even 0.7.1) have an op called tf.one_hot that does what you want. Check it out!

TensorFlow 的最新版本（nightlies 甚至 0.7.1）有一个名为 tf.one_hot 的操作，可以执行您想要的操作。一探究竟！

On the other hand if you have a dense matrix and you want to look up and aggregate values in it, you would want to use the embedding_lookup function.

另一方面，如果您有一个密集矩阵并且您想在其中查找和聚合值，您将需要使用 embedding_lookup 函数。

Answer 8

回答by Peteris

Current versions of tensorflow implement the following function for creating one-hot tensors:

当前版本的 tensorflow 实现了以下用于创建 one-hot 张量的函数：

https://www.tensorflow.org/versions/master/api_docs/python/array_ops.html#one_hot

Answer 9

回答by Rajarshee Mitra

A simple and short way to one-hot encode any integer or list of intergers:

一种对任何整数或整数列表进行单热编码的简单而简短的方法：

a = 5 
b = [1, 2, 3]
# one hot an integer
one_hot_a = tf.nn.embedding_lookup(np.identity(10), a)
# one hot a list of integers
one_hot_b = tf.nn.embedding_lookup(np.identity(max(b)+1), b)

Answer 10

回答by Prakhar Agrawal

numpydoes it!

numpy可以！

import numpy as np
np.eye(n_labels)[target_vector]

Python Tensorflow One 热编码器？

提问by Robert Graves

采纳答案by dga

回答by Salvador Dali

回答by Markus

回答by CFB

回答by Josh11b

回答by Yuan Tang

回答by Eugene Brevdo

回答by Peteris

回答by Rajarshee Mitra

回答by Prakhar Agrawal

相关推荐

最近更新

标签

Python Tensorflow One 热编码器？

提问by Robert Graves

采纳答案by dga

回答by Salvador Dali

回答by Markus

回答by CFB

回答by Josh11b

回答by Yuan Tang

回答by Eugene Brevdo

回答by Peteris

回答by Rajarshee Mitra

回答by Prakhar Agrawal

相关推荐

如何在python中获得两个向量的相关性

用于检查实例类型的 Python 测试

Python：if-endif 语句在哪里结束？

使用 JSON 处理 GET 和 POST 请求的简单 Python 服务器

相关推荐

最近更新

标签