将稀疏矩阵 (csc_matrix) 转换为 Pandas 数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36587702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:02:22  来源:igfitidea点击:

Convert sparse matrix (csc_matrix) to pandas dataframe

pythonpandasdataframetext-analysisword-frequency

提问by Miya Wang

I want to convert this matrix into a pandas dataframe. csc_matrix

我想将此矩阵转换为Pandas数据框。 csc_matrix

The firstnumber in the bracket should be the index, the secondnumber being columnsand the number in the endbeing the data.

括号中的第一个数字应该是索引第二个数字是,最后的数字数据

I want to do this to do feature selection in text analysis, the first number represents the document, the second being the feature of word and the last number being the TFIDF score.

我想这样做是为了在文本分析中进行特征选择,第一个数字代表文档,第二个数字是单词的特征,最后一个数字是 TFIDF 分数。

Getting a dataframe helps me to transform the text analysis problem into data analysis.

获取数据框帮助我将文本分析问题转化为数据分析。

回答by Alexander

from scipy.sparse import csc_matrix

csc = csc_matrix(np.array(
    [[0, 0, 4, 0, 0, 0],
     [1, 0, 0, 0, 2, 0],
     [2, 0, 0, 1, 0, 0],
     [0, 0, 0, 0, 0, 1],
     [4, 0, 3, 2, 0, 0]]))

# Return a Coordinate (coo) representation of the Compresses-Sparse-Column (csc) matrix.
coo = csc.tocoo(copy=False)

# Access `row`, `col` and `data` properties of coo matrix.
>>> pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data}
                 )[['index', 'col', 'data']].sort_values(['index', 'col']
                 ).reset_index(drop=True)
   index  col  data
0      0    2     4
1      1    0     1
2      1    4     2
3      2    0     2
4      2    3     1
5      3    5     1
6      4    0     4
7      4    2     3
8      4    3     2