Python 类型错误:ufunc 'add' 不包含具有签名匹配类型的循环

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35013726/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:52:28  来源:igfitidea点击:

TypeError: ufunc 'add' did not contain a loop with signature matching types

pythonpython-3.x

提问by Masyaf

I am creating bag of words representation of the sentence. Then taking the words that exist in the sentence to compare to the file "vectors.txt", in order to get their embedding vectors. After getting vectors for each word that exists in the sentence, I am taking average of the vectors of the words in the sentence. This is my code:

我正在创建句子的词袋表示。然后将句子中存在的单词与文件“vectors.txt”进行比较,以获得它们的嵌入向量。在获得句子中存在的每个单词的向量后,我对句子中单词的向量求平均值。这是我的代码:

import nltk
import numpy as np
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(10)] 
num_sents = len(news_sents) 

def averageEmbeddings(sentenceTokens, embeddingLookupTable):
    listOfEmb=[]
    for token in sentenceTokens:
        embedding = embeddingLookupTable[token] 
        listOfEmb.append(embedding)

return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))

embeddingVectors = {}

with open("D:\Embedding\vectors.txt") as file: 
    for line in file:
       (key, *val) = line.split()
       embeddingVectors[key] = val

for i in range(num_sents): 
    features = {}
    for word in vocabulary: 
        features[word] = int(word in news_sents[i])        
    print(features) 
    print(list(features.values()))  
sentenceTokens = [] 
for key, value in features.items(): 
    if value == 1:
       sentenceTokens.append(key)
sentenceTokens.remove(".")    
print(sentenceTokens)        
print(averageEmbeddings(sentenceTokens, embeddingVectors))

print(features.keys()) 

Not sure why, but I get this error:

不知道为什么,但我收到此错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-643ccd012438> in <module>()
 39     sentenceTokens.remove(".")
 40     print(sentenceTokens)
---> 41     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 42 
 43 print(features.keys()) 

<ipython-input-4-643ccd012438> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')

P.S. Embedding Vector looks like:

PS嵌入向量看起来像:

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159 
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058 
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355 
in -0.001046 -0.008302 0.010973 0.009608 0.009494 -0.008253 0.001744 0.003263 

After using np.sum I get this error:

使用 np.sum 后,我收到此错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-8a7edbb9d946> in <module>()
 40     sentenceTokens.remove(".")
 41     print(sentenceTokens)
---> 42     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 43 
 44 print(features.keys()) 

<ipython-input-13-8a7edbb9d946> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return np.sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims)
   1829     else:
   1830         return _methods._sum(a, axis=axis, dtype=dtype,
-> 1831                              out=out, keepdims=keepdims)
   1832 
   1833 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims)
 30 
 31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
 33 
 34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type

采纳答案by Dunes

You have a numpy array of strings, not floats. This is what is meant by dtype('<U9')-- a little endian encoded unicode string with up to 9 characters.

您有一个 numpy 字符串数组,而不是浮点数。这就是dtype('<U9')- 最多 9 个字符的小端编码 unicode 字符串的含义。

try:

尝试:

return sum(np.asarray(listOfEmb, dtype=float)) / float(len(listOfEmb))

However, you don't need numpy here at all. You can really just do:

但是,您根本不需要 numpy 。你真的可以这样做:

return sum(float(embedding) for embedding in listOfEmb) / len(listOfEmb)

Or if you're really set on using numpy.

或者,如果您真的开始使用 numpy.

return np.asarray(listOfEmb, dtype=float).mean()