Python Keras.io.preprocessing.sequence.pad_sequences 有什么作用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42943291/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does Keras.io.preprocessing.sequence.pad_sequences do?
提问by Koffiman
The Keras documentation could be improved here. After reading through this, I still do not understand what this does exactly: Keras.io.preprocessing.sequence.pad_sequences
Keras 文档可以在这里改进。看完这篇后,我仍然不明白这到底是做什么的:Keras.io.preprocessing.sequence.pad_sequences
Could someone illuminate what this function does, and ideally provide an example?
有人能说明这个函数的作用吗,最好提供一个例子?
回答by oscfri
pad_sequences
is used to ensure that all sequences in a list have the same length. By default this is done by padding 0
in the beginning of each sequence until each sequence has the same length as the longest sequence.
pad_sequences
用于确保列表中的所有序列具有相同的长度。默认情况下,这是通过0
在每个序列的开头进行填充来完成的,直到每个序列的长度与最长序列的长度相同。
For example
例如
>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]])
array([[0, 1, 2, 3],
[3, 4, 5, 6],
[0, 0, 7, 8]], dtype=int32)
[3, 4, 5, 6]
is the longest sequence, so 0
will be padded to the other sequences so their length matches [3, 4, 5, 6]
.
[3, 4, 5, 6]
是最长的序列,因此0
将被填充到其他序列,以便它们的长度匹配[3, 4, 5, 6]
。
If you rather want to pad to the end of the sequences you can set padding='post'
.
如果您想填充到序列的末尾,您可以设置padding='post'
.
If you want to specify the maximum length of each sequence you can use the maxlen
argument. This will truncate all sequences longer than maxlen
.
如果要指定每个序列的最大长度,可以使用maxlen
参数。这将截断所有长于 的序列maxlen
。
>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]], maxlen=3)
array([[1, 2, 3],
[4, 5, 6],
[0, 7, 8]], dtype=int32)
Now each sequence have the length 3 instead.
现在每个序列的长度为 3。
According to the documentationone can control the truncation with the pad_sequences. By default truncating is set to pre
, which truncates the beginning part of the sequence. If you rather want to truncate the end part of the sequence you can set it to post
.
根据文档,可以使用 pad_sequences 控制截断。默认情况下,截断设置为pre
,这会截断序列的开始部分。如果您想截断序列的结尾部分,您可以将其设置为post
.