Python 将 scipy 稀疏 csr 转换为熊猫?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36967666/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:37:07  来源:igfitidea点击:

transform scipy sparse csr to pandas?

pythonpandasmachine-learningscipyscikit-learn

提问by KillerSnail

I have used the

我已经使用了

sklearn.preprocessing.OneHotEncoder

to transform some data the output is scipy.sparse.csr.csr_matrixhow can I merge it back into my original dataframe along with the other columns?

转换一些数据输出是scipy.sparse.csr.csr_matrix如何将它与其他列一起合并回我的原始数据帧?

I tried to use pd.concatbut I get

我尝试使用pd.concat但我得到

TypeError: cannot concatenate a non-NDFrame object

Thanks

谢谢

回答by Stefan

If A is csr_matrix, you can use .toarray()(there's also .todense()that produces a numpymatrix, which is also works for the DataFrameconstructor):

如果 A is csr_matrix,您可以使用.toarray()(也.todense()有产生 a numpymatrix,这也适用于DataFrame构造函数):

df = pd.DataFrame(A.toarray())

You can then use this with pd.concat().

然后,您可以将其与pd.concat().

A = csr_matrix([[1, 0, 2], [0, 3, 0]])

  (0, 0)    1
  (0, 2)    2
  (1, 1)    3

<class 'scipy.sparse.csr.csr_matrix'>

pd.DataFrame(A.todense())

   0  1  2
0  1  0  2
1  0  3  0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0    2 non-null int64
1    2 non-null int64
2    2 non-null int64

In version 0.20, pandasintroduced sparse data structures, including the SparseDataFrame.

在 0.20 版本中,pandas引入了稀疏数据结构,包括SparseDataFrame.

Alternatively, you can pass sparse matrices to sklearnto avoid running out of memory when converting back to pandas. Just convert your other data to sparse format by passing a numpyarrayto the scipy.sparse.csr_matrixconstructor and use scipy.sparse.hstackto combine (see docs).

或者,您可以将稀疏矩阵传递给sklearn以避免在转换回pandas. 只需通过将 a 传递numpyarrayscipy.sparse.csr_matrix构造函数并使用scipy.sparse.hstackto combine将您的其他数据转换为稀疏格式(请参阅文档)。

回答by scriptator

You could also avoid getting back a sparse matrix in the first place by setting the parameter sparseto Falsewhen creating the Encoder.

您还可以通过在创建编码器时将参数设置为sparse来避免首先取回稀疏矩阵False

The documentation of the OneHotEncoderstates:

OneHotEncoder的文档指出:

sparse : boolean, default=True

Will return sparse matrix if set True else will return an array.

稀疏:布尔值,默认值=真

如果设置为 True,将返回稀疏矩阵,否则将返回一个数组。

Then you can again call the DataFrame constructor to transform the numpy array to a DataFrame.

然后您可以再次调用 DataFrame 构造函数将 numpy 数组转换为 DataFrame。