Python 如何在 keras 中添加注意力机制？

Question

提问by Aryo Pradipta Gema

I'm currently using this code that i get from one discussion on githubHere's the code of the attention mechanism:

我目前正在使用从github上的一次讨论中获得的这段代码，这是注意力机制的代码：

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=vocab_size,
        output_dim=embedding_size,
        input_length=max_length,
        trainable=False,
        mask_zero=False
    )(_input)

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

probabilities = Dense(3, activation='softmax')(sent_representation)

Is this the correct way to do it? i was sort of expecting the existence of time distributed layer since attention mechanism is distributed in every time step of the RNN. I need someone to confirm that this implementation(the code) is a correct implementation of attention mechanism. Thank you.

这是正确的方法吗？我有点期待时间分布层的存在，因为注意力机制分布在 RNN 的每个时间步长中。我需要有人来确认这个实现（代码）是注意力机制的正确实现。谢谢你。

Answer 1

采纳答案by Philippe Remy

If you want to have an attention along the time dimension, then this part of your code seems correct to me:

如果您想关注时间维度，那么这部分代码对我来说似乎是正确的：

activations = LSTM(units, return_sequences=True)(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)

sent_representation = merge([activations, attention], mode='mul')

You've worked out the attention vector of shape (batch_size, max_length):

你已经计算出 shape 的注意力向量(batch_size, max_length)：

attention = Activation('softmax')(attention)

I've never seen this code before, so I can't say if this one is actually correct or not:

我以前从未见过这个代码，所以我不能说这个代码是否真的正确：

K.sum(xin, axis=-2)

Further reading (you might have a look):

进一步阅读（你可以看看）：

Answer 2

回答by MJeremy

Attention mechanism pays attention to different part of the sentence:

注意力机制关注句子的不同部分：

activations = LSTM(units, return_sequences=True)(embedded)

And it determines the contribution of each hidden state of that sentence by

它通过以下方式确定该句子的每个隐藏状态的贡献

Computing the aggregation of each hidden state attention = Dense(1, activation='tanh')(activations)
Assigning weights to different state attention = Activation('softmax')(attention)

计算每个隐藏状态的聚合 attention = Dense(1, activation='tanh')(activations)
为不同的状态分配权重 attention = Activation('softmax')(attention)

And finally pay attention to different states:

最后注意不同的状态：

sent_representation = merge([activations, attention], mode='mul')

I don't quite understand this part: sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

我不太明白这部分： sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

To understand more, you can refer to thisand this, and also this onegives a good implementation, see if you can understand more on your own.

想了解更多可以参考this和this，this one也给出了一个很好的实现，看你自己能不能多了解一些。

Answer 3

回答by Abhijay Ghildyal

Recently I was working with applying attention mechanism on a dense layer and here is one sample implementation:

最近我正在研究在密集层上应用注意力机制，这是一个示例实现：

def build_model():
  input_dims = train_data_X.shape[1]
  inputs = Input(shape=(input_dims,))
  dense1800 = Dense(1800, activation='relu', kernel_regularizer=regularizers.l2(0.01))(inputs)
  attention_probs = Dense( 1800, activation='sigmoid', name='attention_probs')(dense1800)
  attention_mul = multiply([ dense1800, attention_probs], name='attention_mul')
  dense7 = Dense(7, kernel_regularizer=regularizers.l2(0.01), activation='softmax')(attention_mul)   
  model = Model(input=[inputs], output=dense7)
  model.compile(optimizer='adam',
                loss='categorical_crossentropy',
                metrics=['accuracy'])
  return model

print (model.summary)

model.fit( train_data_X, train_data_Y_, epochs=20, validation_split=0.2, batch_size=600, shuffle=True, verbose=1)

Python 如何在 keras 中添加注意力机制？

提问by Aryo Pradipta Gema

采纳答案by Philippe Remy

回答by MJeremy

回答by Abhijay Ghildyal

相关推荐

最近更新

标签

Python 如何在 keras 中添加注意力机制？

提问by Aryo Pradipta Gema

采纳答案by Philippe Remy

回答by MJeremy

回答by Abhijay Ghildyal

相关推荐

Python 使用 pytest 测试类方法

Python Anaconda 和 Brew 的最佳实践

Python Airbnb Airflow 与 Apache Nifi

Python 导入 PyQt5 时 DLL 加载失败

相关推荐

最近更新

标签