将稀疏矩阵 (csc_matrix) 转换为 Pandas 数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36587702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert sparse matrix (csc_matrix) to pandas dataframe
提问by Miya Wang
I want to convert this matrix into a pandas dataframe. csc_matrix
我想将此矩阵转换为Pandas数据框。 csc_matrix
The firstnumber in the bracket should be the index, the secondnumber being columnsand the number in the endbeing the data.
括号中的第一个数字应该是索引,第二个数字是列,最后的数字是数据。
I want to do this to do feature selection in text analysis, the first number represents the document, the second being the feature of word and the last number being the TFIDF score.
我想这样做是为了在文本分析中进行特征选择,第一个数字代表文档,第二个数字是单词的特征,最后一个数字是 TFIDF 分数。
Getting a dataframe helps me to transform the text analysis problem into data analysis.
获取数据框帮助我将文本分析问题转化为数据分析。
回答by Alexander
from scipy.sparse import csc_matrix
csc = csc_matrix(np.array(
[[0, 0, 4, 0, 0, 0],
[1, 0, 0, 0, 2, 0],
[2, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1],
[4, 0, 3, 2, 0, 0]]))
# Return a Coordinate (coo) representation of the Compresses-Sparse-Column (csc) matrix.
coo = csc.tocoo(copy=False)
# Access `row`, `col` and `data` properties of coo matrix.
>>> pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data}
)[['index', 'col', 'data']].sort_values(['index', 'col']
).reset_index(drop=True)
index col data
0 0 2 4
1 1 0 1
2 1 4 2
3 2 0 2
4 2 3 1
5 3 5 1
6 4 0 4
7 4 2 3
8 4 3 2