使用 scipy 在 python 中构建和更新稀疏矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20583381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:47:05  来源:igfitidea点击:

Building and updating a sparse matrix in python using scipy

pythonpython-2.7matrixscipysparse-matrix

提问by syllogismos

I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

当我从文件中读取数据时,我正在尝试构建和更新一个稀疏矩阵。矩阵的大小100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

更新稀疏矩阵的多个条目的最有效方法是什么?具体来说,我需要将每个条目增加 1。

Let's say I have row indices [2, 236, 246, 389, 1691]

假设我有行索引 [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

和列索引 [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

所以以下所有条目都必须加一:

(2,117) (2,3) (2,34) (2,2757) ...

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

等等。

I'm already using lil_matrixas it gave me a warning to use while I tried to update a single entry.

我已经在使用了,lil_matrix因为它在我尝试更新单个条目时给了我一个使用警告。

lil_matrixformat is already not supporting multiple updating. matrix[1:3,0] += [2,3]is giving me a notimplemented error.

lil_matrix格式已经不支持多次更新。 matrix[1:3,0] += [2,3]给了我一个未实现的错误。

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

我可以通过单独增加每个条目来天真地做到这一点。我想知道是否有更好的方法来做到这一点,或者我可以使用更好的稀疏矩阵实现。

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)

我的电脑也是普通的 i5 机器,内存为 4GB,所以我必须小心不要把它炸毁:)

采纳答案by Jaime

Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

使用1新坐标中的 s创建第二个矩阵并将其添加到现有矩阵是一种可能的方法:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>

回答by Ray

import scipy.sparse

rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

I think it works well and doesn't need loops. I am directly following the doc

我认为它运行良好,不需要循环。我直接关注文档

<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
    with 35 stored elements in COOrdinate format>

回答by Warren Weckesser

This answer expands the comment of @behzad.nouri. To increment the values at the "outer product" of your lists of rows and columns indices, just create these as numpy arrays configured for broadcasting. In this case, that means put the rows into a column. For example,

这个答案扩展了@behzad.nouri 的评论。要增加行和列索引列表的“外积”处的值,只需将它们创建为为广播配置的 numpy 数组。在这种情况下,这意味着将行放入一列中。例如,

In [59]: a = lil_matrix((4,4), dtype=int)

In [60]: a.A
Out[60]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [61]: rows = np.array([1,3]).reshape(-1, 1)

In [62]: rows
Out[62]: 
array([[1],
       [3]])

In [63]: cols = np.array([0, 2, 3])

In [64]: a[rows, cols] += np.ones((rows.size, cols.size))

In [65]: a.A
Out[65]: 
array([[0, 0, 0, 0],
       [1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])

In [66]: rows = np.array([0, 1]).reshape(-1,1)

In [67]: cols = np.array([1, 2])

In [68]: a[rows, cols] += np.ones((rows.size, cols.size))

In [69]: a.A
Out[69]: 
array([[0, 1, 1, 0],
       [1, 1, 2, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])