从numpy python中的稀疏矩阵生成密集矩阵

Question

提问by

I have a Sqlite database that contains following type of schema:

我有一个包含以下类型架构的 Sqlite 数据库：

termcount(doc_num, term , count)

This table contains terms with their respective counts in the document. like

此表包含在文档中具有各自计数的术语。喜欢

(doc1 , term1 ,12)
(doc1, term 22, 2)
.
.
(docn,term1 , 10)

This matrix can be considered as sparse matrix as each documents contains very few terms that will have a non-zero value.

该矩阵可以被视为稀疏矩阵，因为每个文档都包含非常少的非零值项。

How would I create a dense matrix from this sparse matrix using numpy as I have to calculate the similarity among documents using cosine similarity.

我将如何使用 numpy 从这个稀疏矩阵创建一个密集矩阵，因为我必须使用余弦相似度计算文档之间的相似度。

This dense matrix will look like a table that have docid as the first column and all the terms will be listed as the first row.and remaining cells will contain counts.

这个密集矩阵看起来像一个表格，第一列是 docid，所有的术语都列在第一行。剩余的单元格将包含计数。

Answer 1

采纳答案by Rachel Gallen

I solved this problem using Pandas. Because we want to keep the document ids and term ids.

我使用 Pandas 解决了这个问题。因为我们要保留文档 ID 和术语 ID。

from pandas import DataFrame 

# A sparse matrix in dictionary form (can be a SQLite database). Tuples contains doc_id        and term_id. 
doc_term_dict={('d1','t1'):12, ('d2','t3'):10, ('d3','t2'):5}

#extract all unique documents and terms ids and intialize a empty dataframe.
rows = set([d for (d,t) in doc_term_dict.keys()])  
cols = set([t for (d,t) in doc_term_dict.keys()])
df = DataFrame(index = rows, columns = cols )
df = df.fillna(0)

#assign all nonzero values in dataframe
for key, value in doc_term_dict.items():
    df[key[1]][key[0]] = value   

print df

Output:

输出：

    t2  t3  t1
d2  0  10   0
d3  5   0   0
d1  0   0  12

Answer 2

回答by Rachel Gallen

 from scipy.sparse import csr_matrix
 A = csr_matrix([[1,0,2],[0,3,0]])
 >>>A
 <2x3 sparse matrix of type '<type 'numpy.int64'>'
    with 3 stored elements in Compressed Sparse Row format>
 >>> A.todense()
   matrix([[1, 0, 2],
           [0, 3, 0]])
 >>> A.toarray()
      array([[1, 0, 2],
            [0, 3, 0]])

this is an example of how to convert a sparse matrix to a dense matrix taken from scipy

这是如何将稀疏矩阵转换为从scipy 中获取的密集矩阵的示例

从numpy python中的稀疏矩阵生成密集矩阵

提问by

采纳答案by Rachel Gallen

回答by Rachel Gallen

相关推荐

最近更新

标签

从numpy python中的稀疏矩阵生成密集矩阵

提问by

采纳答案by Rachel Gallen

回答by Rachel Gallen

相关推荐

如何在 LibreOffice 中运行 python 宏？

Python if len("string") <= 1: 如何打印单词？@codeacademy

Python 将 timedelta 转换为浮点数

Python 在图像 opencv 上画一个圆圈

相关推荐

最近更新

标签