Python TensorFlow 字符串:它们是什么以及如何使用它们

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38902433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:36:22  来源:igfitidea点击:

TensorFlow strings: what they are and how to work with them

pythonstringnumpytensorflowtfrecord

提问by ckorzhik

When I read file with tf.read_fileI get something with type tf.string. Documentation says only that it is "Variable length byte arrays. Each element of a Tensor is a byte array." (https://www.tensorflow.org/versions/r0.10/resources/dims_types.html). I have no idea how to interpret this.

当我阅读文件时,tf.read_file我得到了 type 的东西tf.string。文档只说它是“可变长度字节数组。张量的每个元素都是一个字节数组。” (https://www.tensorflow.org/versions/r0.10/resources/dims_types.html)。我不知道如何解释这一点。

I can do nothing with this type. In usual python you can get elements by index like my_string[:4], but when I run following code I get an error.

我对这种类型无能为力。在通常的python中,您可以通过索引获取元素,例如my_string[:4],但是当我运行以下代码时,出现错误。

import tensorflow as tf
import numpy as np

x = tf.constant("This is string")
y = x[:4]


init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
result = sess.run(y)
print result

It says

它说

  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 621, in assert_has_rank
    raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape () must have rank 1

Also I cannot convert my string to tf.float32tensor. It is .flofile and it has magic header "PIEH". This numpy code successfuly convert such header into number (see example here https://stackoverflow.com/a/28016469/4744283) but I can't do that with tensorflow. I tried tf.string_to_number(string, out_type=tf.float32)but it says

我也无法将我的字符串转换为tf.float32张量。它是.flo文件,它有魔术头“PIEH”。此 numpy 代码成功地将此类标头转换为数字(参见此处的示例https://stackoverflow.com/a/28016469/4744283),但我无法使用 tensorflow 做到这一点。我试过了,tf.string_to_number(string, out_type=tf.float32)但它说

tensorflow.python.framework.errors.InvalidArgumentError: StringToNumberOp could not correctly convert string: PIEH

So, what string is? What it's shape is? How can I at least get part of the string? I suppose that if I can get part of it I can just skip "PIEH" part.

那么,什么是字符串?它的形状是什么?我怎样才能至少获得一部分字符串?我想如果我能得到它的一部分,我可以跳过“PIEH”部分。

UPD: I forgot to say that tf.slice(string, [0], [4])also doesn't work with same error.

UPD:我忘了说这tf.slice(string, [0], [4])也不适用于相同的错误。

回答by keveman

Unlike Python, where a string can be treated as a list of characters for the purposes of slicing and such, TensorFlow's tf.strings are indivisible values. For instance, xbelow is a Tensorwith shape (2,)whose each element is a variable length string.

与 Python 不同,在 Python 中,出于切片等目的,可以将字符串视为字符列表,而 TensorFlow 的tf.strings 是不可分割的值。例如,x下面是一个Tensorwith 形状,(2,)其每个元素都是一个可变长度的字符串。

x = tf.constant(["This is a string", "This is another string"])

However, to achieve what you want, TensorFlow provides the tf.decode_rawoperator. It takes a tf.stringtensor as input, but can decode the string into any other primitive data type. For instance, to interpret the string as a tensor of characters, you can do the following :

然而,为了实现你想要的,TensorFlow 提供了tf.decode_raw操作符。它以tf.string张量作为输入,但可以将字符串解码为任何其他原始数据类型。例如,要将字符串解释为字符张量,您可以执行以下操作:

x = tf.constant("This is string")
x = tf.decode_raw(x, tf.uint8)
y = x[:4]
sess = tf.InteractiveSession()
print(y.eval())
# prints [ 84 104 105 115]