Python 导入 GoogleNews-vectors-negative300.bin

Question

提问by Hello World

I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the last line. I would like to know what can be done to fix the error. Is there any workarounds? Finally, is there a website that I could reference?

我正在使用 gensim 处理代码，并且很难解决代码中的 ValueError 问题。我终于能够压缩 GoogleNews-vectors-negative300.bin.gz 文件，这样我就可以在我的模型中实现它。我也试过gzip，结果不成功。代码中的错误发生在最后一行。我想知道可以做些什么来修复错误。有什么解决方法吗？最后，有没有可以参考的网站？

Thank you respectfully for your assistance!

非常感谢您的帮助！

import gensim
from keras import backend
from keras.layers import Dense, Input, Lambda, LSTM, TimeDistributed
from keras.layers.merge import concatenate
from keras.layers.embeddings import Embedding
from keras.models import Mode

pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin"
word2vec = 
gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path, 
binary=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-23bd96c1d6ab> in <module>()
  1 pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin"
----> 2 word2vec = 
gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path, 
binary=True)

C:\Users\green\Anaconda3\envs\py35\lib\site-
packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname, 
fvocab, binary, encoding, unicode_errors, limit, datatype)
244                             word.append(ch)
245                     word = utils.to_unicode(b''.join(word), 
encoding=encoding, errors=unicode_errors)
--> 246                     weights = fromstring(fin.read(binary_len), 
dtype=REAL)
247                     add_word(word, weights)
248             else:

ValueError: string size must be a multiple of element size

Answer 1

回答by ohsoifelse

The below commands work.

以下命令有效。

brew install wget

wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"

You can then use the below command to get wordVector.

然后您可以使用以下命令获取wordVector。

from gensim import models

w = models.KeyedVectors.load_word2vec_format(
    '../GoogleNews-vectors-negative300.bin', binary=True)

Answer 2

回答by Hello World

you have to write the complete path.

你必须写出完整的路径。

use this path:

使用此路径：

https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz

Answer 3

回答by hansrajSwapnil

try this -

尝试这个 -

import gensim.downloader as api

wv = api.load('word2vec-google-news-300')

vec_king = wv['king']

also, visit this link : https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py

另外，请访问此链接：https: //radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py

Python 导入 GoogleNews-vectors-negative300.bin

提问by Hello World

回答by ohsoifelse

回答by Hello World

回答by hansrajSwapnil

相关推荐

最近更新

标签

Python 导入 GoogleNews-vectors-negative300.bin

提问by Hello World

回答by ohsoifelse

回答by Hello World

回答by hansrajSwapnil

相关推荐

Python 获取安装在 Anaconda 中的软件包列表

Python OHLC 数据上的 Pandas OHLC 聚合

Python Jupyter 笔记本未运行代码。卡在 [*]

Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值

相关推荐

最近更新

标签