Python 如何检查 word2vec 训练模型中是否存在键
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30301922/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to check if a key exists in a word2vec trained model or not
提问by London guy
I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view".
我已经使用 Gensim 的文档语料库训练了 word2vec 模型。一旦模型开始训练,我将编写以下代码来获取单词“view”的原始特征向量。
myModel["view"]
However, I get a KeyError for the word which is probably because this doesn't exist as a key in the list of keys indexed by word2vec. How can I check if a key exits in the index before trying to get the raw feature vector?
但是,我得到了这个词的 KeyError,这可能是因为它不作为 word2vec 索引的键列表中的键存在。在尝试获取原始特征向量之前,如何检查索引中是否存在键?
采纳答案by rakaT
convert the model into vectors with
将模型转换为向量
word_vectors = model.wv
then we can use
然后我们可以使用
if 'word' in word_vectors.vocab
回答by London guy
Answering my own question here.
在这里回答我自己的问题。
Word2Vec provides a method named contains('view') which returns True or False based on whether the corresponding word has been indexed or not.
Word2Vec 提供了一个名为contains('view') 的方法,它根据相应的单词是否已被索引来返回 True 或 False。
回答by Matt Fortier
Word2Vec also provides a 'vocab' member, which you can access directly.
Word2Vec 还提供了一个“词汇”成员,您可以直接访问它。
Using a pythonistic approach:
使用pythonistic方法:
if word in w2v_model.vocab:
# Do something
EDITSince gensim release 2.0, the API for Word2Vec changed. To access the vocabulary you should now use this:
编辑自 gensim 2.0 版以来,Word2Vec 的 API 发生了变化。要访问词汇表,您现在应该使用:
if word in w2v_model.wv.vocab:
# Do something
EDIT 2The attribute 'wv' is being deprecated and will be completed removed in gensim 4.0.0. Now it's back to the original answer by OP:
编辑 2不推荐使用属性“wv”,并将在 gensim 4.0.0 中完成删除。现在它回到了 OP 的原始答案:
if word in w2v_model.vocab:
# Do something
回答by Nomiluks
Hey i know am getting late this post, but here is a piece of code that can handle this issue well. I myself using it in my code and it works like a charm :)
嘿,我知道这篇文章迟到了,但这里有一段代码可以很好地处理这个问题。我自己在我的代码中使用它,它就像一个魅力:)
size = 300 #word vector size
word = 'food' #word token
try:
wordVector = model[word].reshape((1, size))
except KeyError:
print "not found! ", word
NOTE:I am using python Gensim Library for word2vec models
注意:我正在为 word2vec 模型使用 python Gensim 库
回答by Prakhar Agarwal
I generally use a filter:
我通常使用过滤器:
for doc in labeled_corpus:
words = filter(lambda x: x in model.vocab, doc.words)
This is one simple method for getting past the KeyError on unseen words.
这是一种简单的方法,可以解决看不见的单词的 KeyError 问题。