Python 类型错误：ufunc 'add' 不包含具有签名匹配类型的循环

Question

提问by Masyaf

I am creating bag of words representation of the sentence. Then taking the words that exist in the sentence to compare to the file "vectors.txt", in order to get their embedding vectors. After getting vectors for each word that exists in the sentence, I am taking average of the vectors of the words in the sentence. This is my code:

我正在创建句子的词袋表示。然后将句子中存在的单词与文件“vectors.txt”进行比较，以获得它们的嵌入向量。在获得句子中存在的每个单词的向量后，我对句子中单词的向量求平均值。这是我的代码：

import nltk
import numpy as np
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(10)] 
num_sents = len(news_sents) 

def averageEmbeddings(sentenceTokens, embeddingLookupTable):
    listOfEmb=[]
    for token in sentenceTokens:
        embedding = embeddingLookupTable[token] 
        listOfEmb.append(embedding)

return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))

embeddingVectors = {}

with open("D:\Embedding\vectors.txt") as file: 
    for line in file:
       (key, *val) = line.split()
       embeddingVectors[key] = val

for i in range(num_sents): 
    features = {}
    for word in vocabulary: 
        features[word] = int(word in news_sents[i])        
    print(features) 
    print(list(features.values()))  
sentenceTokens = [] 
for key, value in features.items(): 
    if value == 1:
       sentenceTokens.append(key)
sentenceTokens.remove(".")    
print(sentenceTokens)        
print(averageEmbeddings(sentenceTokens, embeddingVectors))

print(features.keys())

Not sure why, but I get this error:

不知道为什么，但我收到此错误：

TypeError                                 Traceback (most recent call last)
<ipython-input-4-643ccd012438> in <module>()
 39     sentenceTokens.remove(".")
 40     print(sentenceTokens)
---> 41     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 42 
 43 print(features.keys()) 

<ipython-input-4-643ccd012438> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')

P.S. Embedding Vector looks like:

PS嵌入向量看起来像：

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159 
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058 
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355 
in -0.001046 -0.008302 0.010973 0.009608 0.009494 -0.008253 0.001744 0.003263

After using np.sum I get this error:

使用 np.sum 后，我收到此错误：

TypeError                                 Traceback (most recent call last)
<ipython-input-13-8a7edbb9d946> in <module>()
 40     sentenceTokens.remove(".")
 41     print(sentenceTokens)
---> 42     print(averageEmbeddings(sentenceTokens, embeddingVectors))
 43 
 44 print(features.keys()) 

<ipython-input-13-8a7edbb9d946> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
 18         listOfEmb.append(embedding)
 19 
---> 20     return np.sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
 21 
 22 embeddingVectors = {}

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims)
   1829     else:
   1830         return _methods._sum(a, axis=axis, dtype=dtype,
-> 1831                              out=out, keepdims=keepdims)
   1832 
   1833 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims)
 30 
 31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
 33 
 34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type

Answer 1

采纳答案by Dunes

You have a numpy array of strings, not floats. This is what is meant by dtype('<U9')-- a little endian encoded unicode string with up to 9 characters.

您有一个 numpy 字符串数组，而不是浮点数。这就是dtype('<U9')- 最多 9 个字符的小端编码 unicode 字符串的含义。

try:

尝试：

return sum(np.asarray(listOfEmb, dtype=float)) / float(len(listOfEmb))

However, you don't need numpy here at all. You can really just do:

但是，您根本不需要 numpy 。你真的可以这样做：

return sum(float(embedding) for embedding in listOfEmb) / len(listOfEmb)

Or if you're really set on using numpy.

或者，如果您真的开始使用 numpy.

return np.asarray(listOfEmb, dtype=float).mean()

Python 类型错误：ufunc 'add' 不包含具有签名匹配类型的循环

提问by Masyaf

采纳答案by Dunes

相关推荐

最近更新

标签

Python 类型错误：ufunc 'add' 不包含具有签名匹配类型的循环

提问by Masyaf

采纳答案by Dunes

相关推荐

Python 只有整数、切片 (`:`)、省略号 (`...`)、numpy.newaxis (`None`) 和整数或布尔数组是有效索引

如何让 Python 复利计算器给出正确答案

Python numpy中“在less_equal中遇到无效值”的原因可能是什么

Python 使用 plotly offline 将图形生成为图像

相关推荐

最近更新

标签