Python 将 scipy 稀疏 csr 转换为熊猫？

Question

提问by KillerSnail

I have used the

我已经使用了

sklearn.preprocessing.OneHotEncoder

to transform some data the output is scipy.sparse.csr.csr_matrixhow can I merge it back into my original dataframe along with the other columns?

转换一些数据输出是scipy.sparse.csr.csr_matrix如何将它与其他列一起合并回我的原始数据帧？

I tried to use pd.concatbut I get

我尝试使用pd.concat但我得到

TypeError: cannot concatenate a non-NDFrame object

Thanks

谢谢

Answer 1

回答by Stefan

If A is csr_matrix, you can use .toarray()(there's also .todense()that produces a numpymatrix, which is also works for the DataFrameconstructor):

如果 A is csr_matrix，您可以使用.toarray()（也.todense()有产生 a numpymatrix，这也适用于DataFrame构造函数）：

df = pd.DataFrame(A.toarray())

You can then use this with pd.concat().

然后，您可以将其与pd.concat().

A = csr_matrix([[1, 0, 2], [0, 3, 0]])

  (0, 0)    1
  (0, 2)    2
  (1, 1)    3

<class 'scipy.sparse.csr.csr_matrix'>

pd.DataFrame(A.todense())

   0  1  2
0  1  0  2
1  0  3  0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0    2 non-null int64
1    2 non-null int64
2    2 non-null int64

In version 0.20, pandasintroduced sparse data structures, including the SparseDataFrame.

在 0.20 版本中，pandas引入了稀疏数据结构，包括SparseDataFrame.

Alternatively, you can pass sparse matrices to sklearnto avoid running out of memory when converting back to pandas. Just convert your other data to sparse format by passing a numpyarrayto the scipy.sparse.csr_matrixconstructor and use scipy.sparse.hstackto combine (see docs).

或者，您可以将稀疏矩阵传递给sklearn以避免在转换回pandas. 只需通过将 a 传递numpyarray给scipy.sparse.csr_matrix构造函数并使用scipy.sparse.hstackto combine将您的其他数据转换为稀疏格式（请参阅文档）。

Answer 2

回答by scriptator

You could also avoid getting back a sparse matrix in the first place by setting the parameter sparseto Falsewhen creating the Encoder.

您还可以通过在创建编码器时将参数设置为sparse来避免首先取回稀疏矩阵False。

The documentation of the OneHotEncoderstates:

OneHotEncoder的文档指出：

sparse : boolean, default=True
Will return sparse matrix if set True else will return an array.

稀疏：布尔值，默认值=真
如果设置为 True，将返回稀疏矩阵，否则将返回一个数组。

Then you can again call the DataFrame constructor to transform the numpy array to a DataFrame.

然后您可以再次调用 DataFrame 构造函数将 numpy 数组转换为 DataFrame。

Python 将 scipy 稀疏 csr 转换为熊猫？

提问by KillerSnail

回答by Stefan

回答by scriptator

相关推荐

最近更新

标签

Python 将 scipy 稀疏 csr 转换为熊猫？

提问by KillerSnail

回答by Stefan

回答by scriptator

相关推荐

如何通过命令行将数组传递给python

Python json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190)

Python 如何将 numpy 数组转换为标准的 TensorFlow 格式？

Python 如何检查pytorch是否正在使用GPU？

相关推荐

最近更新

标签