从 SciPy 稀疏矩阵填充 Pandas SparseDataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/17818783/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Populate a Pandas SparseDataFrame from a SciPy Sparse Matrix
提问by Will
I noticed Pandas now has support for Sparse Matrices and Arrays.  Currently, I create DataFrame()s like this:
我注意到 Pandas 现在支持 Sparse Matrices 和 Arrays。目前,我DataFrame()像这样创建s:
return DataFrame(matrix.toarray(), columns=features, index=observations)
Is there a way to create a SparseDataFrame()with a scipy.sparse.csc_matrix()or csr_matrix()? Converting to dense format kills RAM badly. Thanks!
有没有办法SparseDataFrame()用 ascipy.sparse.csc_matrix()或csr_matrix()? 转换为密集格式会严重破坏 RAM。谢谢!
采纳答案by Jeff
A direct conversion is not supported ATM. Contributions are welcome!
ATM 不支持直接转换。欢迎投稿!
Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient
试试这个,内存应该没问题,因为 SpareSeries 很像 csc_matrix(用于 1 列)并且非常节省空间
In [37]: col = np.array([0,0,1,2,2,2])
In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')
In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )
In [40]: m
Out[40]: 
<3x3 sparse matrix of type '<type 'numpy.float64'>'
        with 6 stored elements in Compressed Sparse Column format>
In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                              for i in np.arange(m.shape[0]) ])
Out[46]: 
   0  1  2
0  1  0  4
1  0  0  5
2  2  3  6
In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel()) 
                                   for i in np.arange(m.shape[0]) ])
In [48]: type(df)
Out[48]: pandas.sparse.frame.SparseDataFrame
回答by Alex
As of pandas v 0.20.0 you can use the SparseDataFrameconstructor.
从 pandas v 0.20.0 开始,您可以使用SparseDataFrame构造函数。
An example from the pandas docs:
Pandas文档中的一个例子:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
arr = np.random.random(size=(1000, 5))
arr[arr < .9] = 0
sp_arr = csr_matrix(arr)
sdf = pd.SparseDataFrame(sp_arr)
回答by Boris Gorelik
A much shorter version:
一个更短的版本:
df = pd.DataFrame(m.toarray())

