Python 将 Pandas 数据帧直接转换为稀疏 Numpy 矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20459536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:29:50  来源:igfitidea点击:

Convert Pandas dataframe to Sparse Numpy Matrix directly

pythonnumpypandasscipy

提问by user7289

I am creating a matrix from a Pandas dataframe as follows:

我正在从 Pandas 数据帧创建一个矩阵,如下所示:

dense_matrix = np.array(df.as_matrix(columns = None), dtype=bool).astype(np.int)

And then into a sparse matrix with:

然后变成一个稀疏矩阵:

sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)

Is there any way to go from a df straight to a sparse matrix?

有没有办法从 df 直接到稀疏矩阵?

Thanks in advance.

提前致谢。

采纳答案by Dan Allan

df.valuesis a numpy array, and accessing values that way is always faster than np.array.

df.values是一个 numpy 数组,以这种方式访问​​值总是比np.array.

scipy.sparse.csr_matrix(df.values)

You might need to take the transpose first, like df.values.T. In DataFrames, the columns are axis 0.

您可能需要先进行转置,例如df.values.T. 在 DataFrame 中,列是轴 0。