Python gensim word2vec：查找词汇表中的单词数

Question

提问by hlin117

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?

使用 python gensim训练一个 word2vec 模型后，你如何找到模型词汇表中的单词数？

Answer 1

采纳答案by gojomo

The vocabulary is in the vocabfield of the Word2Vec model's wvproperty, as a dictionary, with the keys being each token (word). So it's just the usual Python for getting a dictionary's length:

词汇表在vocabWord2Vec 模型的wv属性字段中，作为字典，键是每个标记（单词）。所以它只是用于获取字典长度的常用 Python：

len(w2v_model.wv.vocab)

(In older gensim versions before 0.13, vocabappeared directly on the model. So you would use w2v_model.vocabinstead of w2v_model.wv.vocab.)

（在 0.13 之前的旧 gensim 版本中，vocab直接出现在模型上。因此您将使用w2v_model.vocab代替w2v_model.wv.vocab。）

Answer 2

回答by kmario23

One more way to get the vocabulary size is from the embedding matrix itself as in:

获得词汇量大小的另一种方法是从嵌入矩阵本身，如下所示：

In [33]: from gensim.models import Word2Vec

# load the pretrained model
In [34]: model = Word2Vec.load(pretrained_model)

# get the shape of embedding matrix    
In [35]: model.wv.vectors.shape
Out[35]: (662109, 300)

# `vocabulary_size` is just the number of rows (i.e. axis 0)
In [36]: model.wv.vectors.shape[0]
Out[36]: 662109

Python gensim word2vec：查找词汇表中的单词数

提问by hlin117

采纳答案by gojomo

回答by kmario23

相关推荐

最近更新

标签

Python gensim word2vec：查找词汇表中的单词数

提问by hlin117

采纳答案by gojomo

回答by kmario23

相关推荐

Python Pandas 在连接后重新计算索引

Python 在 selenium 中找到没有 id 的提交按钮

如何在python的列表中找到唯一元素？（不使用 set）

Python KeyError: 'TCL_Library' 当我使用 cx_Freeze 时

相关推荐

最近更新

标签