pandas Scipy hstack 导致“类型错误：不支持类型转换：(dtype('float64'), dtype('O'))”

Question

提问by Simon Kiely

I am trying to run hstack to join a column of integer values to a list of columns created by a TF-IDF (so I can eventually use all of these columns/features in a classifier).

我正在尝试运行 hstack 将一列整数值连接到由 TF-IDF 创建的列列表（因此我最终可以在分类器中使用所有这些列/功能）。

I'm reading in the column using pandas, checking for any NA values and converting them to the largest value in the dataframe like so :

我正在使用 Pandas 在列中阅读，检查任何 NA 值并将它们转换为数据框中的最大值，如下所示：

  OtherColumn = p.read_csv('file.csv', delimiter=";", na_values=['?'])[["OtherColumn"]]
  OtherColumn = OtherColumn.fillna(OtherColumn.max())
  OtherColumn = OtherColumn.convert_objects(convert_numeric=True)

Then I read in my text column and run TF-IDF to create loads of features :

然后我阅读我的文本列并运行 TF-IDF 来创建大量功能：

  X = list(np.array(p.read_csv('file.csv', delimiter=";"))[:,2])

  tfv = TfidfVectorizer(min_df=3,  max_features=None, strip_accents='unicode',  
        analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1)
  tfv.fit(X)

Finally, I want to join them all together, and this is where our error occurs and the program cannot run, and also I am unsure whether I am using the StandardScaler appropriately here :

最后，我想将它们全部连接在一起，这就是我们发生错误并且程序无法运行的地方，而且我不确定我是否在这里适当地使用了 StandardScaler：

  X =  sp.sparse.hstack((X, OtherColumn.values)) #error here
  sc = preprocessing.StandardScaler().fit(X)
  X = sc.transform(X)
  X_test = sc.transform(X_test)

Full error message:

完整的错误信息：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-79d1e70bc1bc> in <module>()
---> 47 X =  sp.sparse.hstack((X, OtherColumn.values))
     48 sc = preprocessing.StandardScaler().fit(X)
     49 X = sc.transform(X)

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\construct.pyc in hstack(blocks, format, dtype)
    421 
    422     """
--> 423     return bmat([blocks], format=format, dtype=dtype)
    424 
    425 

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\construct.pyc in bmat(blocks, format, dtype)
    537     nnz = sum([A.nnz for A in blocks[block_mask]])
    538     if dtype is None:
--> 539         dtype = upcast(*tuple([A.dtype for A in blocks[block_mask]]))
    540 
    541     row_offsets = np.concatenate(([0], np.cumsum(brow_lengths)))

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\sputils.pyc in upcast(*args)
     58             return t
     59 
---> 60     raise TypeError('no supported conversion for types: %r' % (args,))
     61 
     62 

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

Answer 1

回答by hpaulj

As discussed in Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they doyou many need to explicitly cast the inputs to sparse.hstack. The sparsecode is not as robust as the core numpycode.

正如Numpy hstack 中所讨论的- “ValueError：所有输入数组必须具有相同的维数” - 但是他们确实需要将输入显式转换为sparse.hstack. 该sparse代码是不一样强大的核心numpy代码。

If Xis a sparse array with dtype=float, and Ais dense with dtype=object, several options are possible.

如果X是一个带有的稀疏数组dtype=float，并且A带有dtype=object，那么有几个选项是可能的。

sparse.hstack(X, A) # error
sparse.hstack(X.astype(object), A) # cast X to object; return object
sparse.hstack(X, A.astype(float)) # cast A to float; return float
hstack(X.A, A) # make X dense, result will be type object

A.astype(float)will work if Acontains some NaN. See http://pandas.pydata.org/pandas-docs/stable/gotchas.htmlregarding NaN. If Ais object for some other reason (e.g. ragged lists), then we'll have to revisit the issue.

A.astype(float)如果A包含一些NaN. 有关 NaN，请参阅http://pandas.pydata.org/pandas-docs/stable/gotchas.html。如果A由于某些其他原因（例如不完整的列表）而成为对象，那么我们将不得不重新审视这个问题。

Another possibility is to use Pandas's concat. http://pandas.pydata.org/pandas-docs/stable/merging.html. I assume Pandas has paid more attention to these issues than the sparsecoders.

另一种可能性是使用 Pandas 的concat. http://pandas.pydata.org/pandas-docs/stable/merging.html。我认为 Pandas 比sparse编码人员更关注这些问题。

pandas Scipy hstack 导致“类型错误：不支持类型转换：(dtype('float64'), dtype('O'))”

提问by Simon Kiely

回答by hpaulj

相关推荐

最近更新

标签

pandas Scipy hstack 导致“类型错误：不支持类型转换：(dtype('float64'), dtype('O'))”

提问by Simon Kiely

回答by hpaulj

相关推荐

pandas 如何创建熊猫数据框字典，并将数据框返回到 Excel 工作表中？

pandas 如何使用pandas/python处理excel文件头

pandas 如何获取pandas MultiIndex数据框中的索引值？

如何在 Pandas 数据框中将对象转换为日期时间？

相关推荐

最近更新

标签