从numpy python中的稀疏矩阵生成密集矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16505670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:50:51  来源:igfitidea点击:

Generating a dense matrix from a sparse matrix in numpy python

pythonarraysnumpyscipysparse-matrix

提问by

I have a Sqlite database that contains following type of schema:

我有一个包含以下类型架构的 Sqlite 数据库:

termcount(doc_num, term , count)

This table contains terms with their respective counts in the document. like

此表包含在文档中具有各自计数的术语。喜欢

(doc1 , term1 ,12)
(doc1, term 22, 2)
.
.
(docn,term1 , 10)

This matrix can be considered as sparse matrix as each documents contains very few terms that will have a non-zero value.

该矩阵可以被视为稀疏矩阵,因为每个文档都包含非常少的非零值项。

How would I create a dense matrix from this sparse matrix using numpy as I have to calculate the similarity among documents using cosine similarity.

我将如何使用 numpy 从这个稀疏矩阵创建一个密集矩阵,因为我必须使用余弦相似度计算文档之间的相似度。

This dense matrix will look like a table that have docid as the first column and all the terms will be listed as the first row.and remaining cells will contain counts.

这个密集矩阵看起来像一个表格,第一列是 docid,所有的术语都列在第一行。剩余的单元格将包含计数。

采纳答案by Rachel Gallen

I solved this problem using Pandas. Because we want to keep the document ids and term ids.

我使用 Pandas 解决了这个问题。因为我们要保留文档 ID 和术语 ID。

from pandas import DataFrame 

# A sparse matrix in dictionary form (can be a SQLite database). Tuples contains doc_id        and term_id. 
doc_term_dict={('d1','t1'):12, ('d2','t3'):10, ('d3','t2'):5}

#extract all unique documents and terms ids and intialize a empty dataframe.
rows = set([d for (d,t) in doc_term_dict.keys()])  
cols = set([t for (d,t) in doc_term_dict.keys()])
df = DataFrame(index = rows, columns = cols )
df = df.fillna(0)

#assign all nonzero values in dataframe
for key, value in doc_term_dict.items():
    df[key[1]][key[0]] = value   

print df

Output:

输出:

    t2  t3  t1
d2  0  10   0
d3  5   0   0
d1  0   0  12

回答by Rachel Gallen

 from scipy.sparse import csr_matrix
 A = csr_matrix([[1,0,2],[0,3,0]])
 >>>A
 <2x3 sparse matrix of type '<type 'numpy.int64'>'
    with 3 stored elements in Compressed Sparse Row format>
 >>> A.todense()
   matrix([[1, 0, 2],
           [0, 3, 0]])
 >>> A.toarray()
      array([[1, 0, 2],
            [0, 3, 0]])

this is an example of how to convert a sparse matrix to a dense matrix taken from scipy

这是如何将稀疏矩阵转换为从scipy 中获取的密集矩阵的示例