Python 导入 GoogleNews-vectors-negative300.bin
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46433778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Import GoogleNews-vectors-negative300.bin
提问by Hello World
I am working on code using the gensim and having a tough time troubleshooting a ValueError within my code. I finally was able to zip GoogleNews-vectors-negative300.bin.gz file so I could implement it in my model. I also tried gzip which the results were unsuccessful. The error in the code occurs in the last line. I would like to know what can be done to fix the error. Is there any workarounds? Finally, is there a website that I could reference?
我正在使用 gensim 处理代码,并且很难解决代码中的 ValueError 问题。我终于能够压缩 GoogleNews-vectors-negative300.bin.gz 文件,这样我就可以在我的模型中实现它。我也试过gzip,结果不成功。代码中的错误发生在最后一行。我想知道可以做些什么来修复错误。有什么解决方法吗?最后,有没有可以参考的网站?
Thank you respectfully for your assistance!
非常感谢您的帮助!
import gensim
from keras import backend
from keras.layers import Dense, Input, Lambda, LSTM, TimeDistributed
from keras.layers.merge import concatenate
from keras.layers.embeddings import Embedding
from keras.models import Mode
pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin"
word2vec =
gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,
binary=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-23bd96c1d6ab> in <module>()
1 pretrained_embeddings_path = "GoogleNews-vectors-negative300.bin"
----> 2 word2vec =
gensim.models.KeyedVectors.load_word2vec_format(pretrained_embeddings_path,
binary=True)
C:\Users\green\Anaconda3\envs\py35\lib\site-
packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname,
fvocab, binary, encoding, unicode_errors, limit, datatype)
244 word.append(ch)
245 word = utils.to_unicode(b''.join(word),
encoding=encoding, errors=unicode_errors)
--> 246 weights = fromstring(fin.read(binary_len),
dtype=REAL)
247 add_word(word, weights)
248 else:
ValueError: string size must be a multiple of element size
回答by ohsoifelse
The below commands work.
以下命令有效。
brew install wget
wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
You can then use the below command to get wordVector.
然后您可以使用以下命令获取wordVector。
from gensim import models
w = models.KeyedVectors.load_word2vec_format(
'../GoogleNews-vectors-negative300.bin', binary=True)
回答by Hello World
you have to write the complete path.
你必须写出完整的路径。
use this path:
使用此路径:
https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz
https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz
回答by hansrajSwapnil
try this -
尝试这个 -
import gensim.downloader as api
wv = api.load('word2vec-google-news-300')
vec_king = wv['king']
also, visit this link : https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py
另外,请访问此链接:https: //radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py