Python Keras.io.preprocessing.sequence.pad_sequences 有什么作用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42943291/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:17:35  来源:igfitidea点击:

What does Keras.io.preprocessing.sequence.pad_sequences do?

pythondeep-learningkeras

提问by Koffiman

The Keras documentation could be improved here. After reading through this, I still do not understand what this does exactly: Keras.io.preprocessing.sequence.pad_sequences

Keras 文档可以在这里改进。看完这篇后,我仍然不明白这到底是做什么的:Keras.io.preprocessing.sequence.pad_sequences

Could someone illuminate what this function does, and ideally provide an example?

有人能说明这个函数的作用吗,最好提供一个例子?

回答by oscfri

pad_sequencesis used to ensure that all sequences in a list have the same length. By default this is done by padding 0in the beginning of each sequence until each sequence has the same length as the longest sequence.

pad_sequences用于确保列表中的所有序列具有相同的长度。默认情况下,这是通过0在每个序列的开头进行填充来完成的,直到每个序列的长度与最长序列的长度相同。

For example

例如

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]])
array([[0, 1, 2, 3],
       [3, 4, 5, 6],
       [0, 0, 7, 8]], dtype=int32)

[3, 4, 5, 6]is the longest sequence, so 0will be padded to the other sequences so their length matches [3, 4, 5, 6].

[3, 4, 5, 6]是最长的序列,因此0将被填充到其他序列,以便它们的长度匹配[3, 4, 5, 6]

If you rather want to pad to the end of the sequences you can set padding='post'.

如果您想填充到序列的末尾,您可以设置padding='post'.

If you want to specify the maximum length of each sequence you can use the maxlenargument. This will truncate all sequences longer than maxlen.

如果要指定每个序列的最大长度,可以使用maxlen参数。这将截断所有长于 的序列maxlen

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]], maxlen=3)
array([[1, 2, 3],
       [4, 5, 6],
       [0, 7, 8]], dtype=int32)

Now each sequence have the length 3 instead.

现在每个序列的长度为 3。

According to the documentationone can control the truncation with the pad_sequences. By default truncating is set to pre, which truncates the beginning part of the sequence. If you rather want to truncate the end part of the sequence you can set it to post.

根据文档,可以使用 pad_sequences 控制截断。默认情况下,截断设置为pre,这会截断序列的开始部分。如果您想截断序列的结尾部分,您可以将其设置为post.