Python CountVectorizer: AttributeError: 'numpy.ndarray' 对象没有属性 'lower'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26367075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'
提问by ashu
I have a one-dimensional array with large strings in each of the elements. I am trying to use a CountVectorizerto convert text data into numerical vectors. However, I am getting an error saying:
我有一个一维数组,每个元素都有大字符串。我正在尝试使用 aCountVectorizer将文本数据转换为数值向量。但是,我收到一条错误消息:
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
mealarraycontains large strings in each of the elements. There are 5000 such samples. I am trying to vectorize this as given below:
mealarray在每个元素中包含大字符串。有 5000 个这样的样本。我正在尝试将其矢量化,如下所示:
vectorizer = CountVectorizer(
stop_words='english',
ngram_range=(1, 1), #ngram_range=(1, 1) is the default
dtype='double',
)
data = vectorizer.fit_transform(mealarray)
The full stacktrace :
完整的堆栈跟踪:
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
self.fixed_vocabulary_)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
for feature in analyze(doc):
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
回答by Warren Weckesser
Check the shape of mealarray. If the argument to fit_transformis an array of strings, it must be a one-dimensional array. (That is, mealarray.shapemust be of the form (n,).) For example, you'll get the "no attribute" error if mealarrayhas a shape such as (n, 1).
检查 的形状mealarray。如果参数 tofit_transform是字符串数组,则它必须是一维数组。(也就是说,mealarray.shape必须是 形式(n,)。)例如,如果mealarray具有诸如(n, 1).
You could try something like
你可以尝试类似的东西
data = vectorizer.fit_transform(mealarray.ravel())
回答by ashu
Got the answer to my question. Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array. That solved my problem.
得到了我的问题的答案。基本上,CountVectorizer 将列表(带有字符串内容)作为参数而不是数组。那解决了我的问题。
回答by Max Kleiner
A better solution is explicit call pandas series and pass it CountVectorizer():
一个更好的解决方案是显式调用 pandas 系列并将其传递给 CountVectorizer():
>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)
Next one won't work, cause its a frame and NOT series
下一个不起作用,因为它是一个框架而不是系列
>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>
回答by Mr. Sigma.
The error should be sufficient to get rid of the bug. Check if your dataframe or series has non string type element. Also, do check specifically if there are any nanvalues.
错误应该足以摆脱错误。检查您的数据框或系列是否具有非字符串类型元素。另外,请特别检查是否有任何nan值。

