Python 扩展(添加行或列)一个 scipy.sparse 矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4695337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:58:47  来源:igfitidea点击:

expanding (adding a row or column) a scipy.sparse matrix

pythonscipysparse-matrix

提问by RandomGuy

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?

假设我有一个来自 scipy.sparse 的 NxN 矩阵 M(lil_matrix 或 csr_matrix),我想将其设为 (N+1)xN,其中 M_modified[i,j] = M[i,j] for 0 <= i < N (和所有 j)和 M[N,j] = 0 对于所有 j。基本上,我想在 M 的底部添加一行零并保留矩阵的其余部分。有没有办法在不复制数据的情况下做到这一点?

采纳答案by Justin Peel

I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.

我认为没有任何方法可以真正摆脱复制。这两种类型的稀疏矩阵都在内部将其数据存储为 Numpy 数组(在 csr 的数据和索引属性中以及在 lil 的数据和行属性中),并且 Numpy 数组不能扩展。

Update with more information:

更新更多信息:

LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for dataand rowsare both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.

LIL 确实代表链接列表,但当前的实现并不完全符合名称。用于data和的 Numpy 数组rows都是对象类型。这些数组中的每个对象实际上都是 Python 列表(当所有值在一行中都为零时为空列表)。Python 列表并不完全是链表,但由于 O(1) 查找,它们有点接近并且坦率地说是更好的选择。就我个人而言,我并没有立即看到在这里使用 Numpy 对象数组而不仅仅是 Python 列表的意义。您可以相当轻松地将当前的 lil 实现更改为使用 Python 列表,这将允许您在不复制整个矩阵的情况下添加一行。

回答by Siddhant

Not sure if you're still looking for a solution, but maybe others can look into hstackand vstack- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then vstackit with the previous matrix.

不确定您是否仍在寻找解决方案,但也许其他人可以查看hstackvstack- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html。我认为我们可以为单个附加行定义一个 csr_matrix,然后vstack与前一个矩阵一起定义。

回答by JakeM

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

Scipy 没有办法在不复制数据的情况下做到这一点,但您可以通过更改定义稀疏矩阵的属性来自己完成。

There are 4 attributes that make up the csr_matrix:

csr_matrix 有 4 个属性:

data: An array containing the actual values in the matrix

数据:包含矩阵中实际值的数组

indices: An array containing the column index corresponding to each value in data

索引:包含与数据中每个值对应的列索引的数组

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

indptr:一个数组,用于指定每行数据中第一个值之前的索引。如果该行为空,则索引与前一列相同。

shape: A tuple containing the shape of the matrix

形状:包含矩阵形状的元组

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

如果您只是在底部添加一行零,您所要做的就是更改矩阵的形状和 indptr。

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

这是一个处理 vstacking 任何 2 个 csr_matrices 的更一般情况的函数。您最终仍会复制底层的 numpy 数组,但它仍然比 scipy vstack 方法快得多。

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a