Python 将 scipy 稀疏 csr 转换为熊猫?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36967666/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
transform scipy sparse csr to pandas?
提问by KillerSnail
I have used the
我已经使用了
sklearn.preprocessing.OneHotEncoder
to transform some data the output is scipy.sparse.csr.csr_matrix
how can I merge it back into my original dataframe along with the other columns?
转换一些数据输出是scipy.sparse.csr.csr_matrix
如何将它与其他列一起合并回我的原始数据帧?
I tried to use pd.concat
but I get
我尝试使用pd.concat
但我得到
TypeError: cannot concatenate a non-NDFrame object
Thanks
谢谢
回答by Stefan
If A is csr_matrix
, you can use .toarray()
(there's also .todense()
that produces a numpy
matrix
, which is also works for the DataFrame
constructor):
如果 A is csr_matrix
,您可以使用.toarray()
(也.todense()
有产生 a numpy
matrix
,这也适用于DataFrame
构造函数):
df = pd.DataFrame(A.toarray())
You can then use this with pd.concat()
.
然后,您可以将其与pd.concat()
.
A = csr_matrix([[1, 0, 2], [0, 3, 0]])
(0, 0) 1
(0, 2) 2
(1, 1) 3
<class 'scipy.sparse.csr.csr_matrix'>
pd.DataFrame(A.todense())
0 1 2
0 1 0 2
1 0 3 0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null int64
1 2 non-null int64
2 2 non-null int64
In version 0.20, pandas
introduced sparse data structures, including the SparseDataFrame
.
在 0.20 版本中,pandas
引入了稀疏数据结构,包括SparseDataFrame
.
Alternatively, you can pass sparse matrices to sklearn
to avoid running out of memory when converting back to pandas
. Just convert your other data to sparse format by passing a numpy
array
to the scipy.sparse.csr_matrix
constructor and use scipy.sparse.hstack
to combine (see docs).
或者,您可以将稀疏矩阵传递给sklearn
以避免在转换回pandas
. 只需通过将 a 传递numpy
array
给scipy.sparse.csr_matrix
构造函数并使用scipy.sparse.hstack
to combine将您的其他数据转换为稀疏格式(请参阅文档)。
回答by scriptator
You could also avoid getting back a sparse matrix in the first place by setting the parameter sparse
to False
when creating the Encoder.
您还可以通过在创建编码器时将参数设置为sparse
来避免首先取回稀疏矩阵False
。
The documentation of the OneHotEncoderstates:
OneHotEncoder的文档指出:
sparse : boolean, default=True
Will return sparse matrix if set True else will return an array.
稀疏:布尔值,默认值=真
如果设置为 True,将返回稀疏矩阵,否则将返回一个数组。
Then you can again call the DataFrame constructor to transform the numpy array to a DataFrame.
然后您可以再次调用 DataFrame 构造函数将 numpy 数组转换为 DataFrame。