Python 扩展（添加行或列）一个 scipy.sparse 矩阵

Question

提问by RandomGuy

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?

假设我有一个来自 scipy.sparse 的 NxN 矩阵 M（lil_matrix 或 csr_matrix），我想将其设为 (N+1)xN，其中 M_modified[i,j] = M[i,j] for 0 <= i < N （和所有 j）和 M[N,j] = 0 对于所有 j。基本上，我想在 M 的底部添加一行零并保留矩阵的其余部分。有没有办法在不复制数据的情况下做到这一点？

Answer 1

采纳答案by Justin Peel

I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.

我认为没有任何方法可以真正摆脱复制。这两种类型的稀疏矩阵都在内部将其数据存储为 Numpy 数组（在 csr 的数据和索引属性中以及在 lil 的数据和行属性中），并且 Numpy 数组不能扩展。

Update with more information:

更新更多信息：

LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for dataand rowsare both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.

LIL 确实代表链接列表，但当前的实现并不完全符合名称。用于data和的 Numpy 数组rows都是对象类型。这些数组中的每个对象实际上都是 Python 列表（当所有值在一行中都为零时为空列表）。Python 列表并不完全是链表，但由于 O(1) 查找，它们有点接近并且坦率地说是更好的选择。就我个人而言，我并没有立即看到在这里使用 Numpy 对象数组而不仅仅是 Python 列表的意义。您可以相当轻松地将当前的 lil 实现更改为使用 Python 列表，这将允许您在不复制整个矩阵的情况下添加一行。

Answer 2

回答by Siddhant

Not sure if you're still looking for a solution, but maybe others can look into hstackand vstack- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then vstackit with the previous matrix.

不确定您是否仍在寻找解决方案，但也许其他人可以查看hstack和vstack- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html。我认为我们可以为单个附加行定义一个 csr_matrix，然后vstack与前一个矩阵一起定义。

Answer 3

回答by JakeM

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

Scipy 没有办法在不复制数据的情况下做到这一点，但您可以通过更改定义稀疏矩阵的属性来自己完成。

There are 4 attributes that make up the csr_matrix:

csr_matrix 有 4 个属性：

data: An array containing the actual values in the matrix

数据：包含矩阵中实际值的数组

indices: An array containing the column index corresponding to each value in data

索引：包含与数据中每个值对应的列索引的数组

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

indptr：一个数组，用于指定每行数据中第一个值之前的索引。如果该行为空，则索引与前一列相同。

shape: A tuple containing the shape of the matrix

形状：包含矩阵形状的元组

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

如果您只是在底部添加一行零，您所要做的就是更改矩阵的形状和 indptr。

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

这是一个处理 vstacking 任何 2 个 csr_matrices 的更一般情况的函数。您最终仍会复制底层的 numpy 数组，但它仍然比 scipy vstack 方法快得多。

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

Python 扩展（添加行或列）一个 scipy.sparse 矩阵

提问by RandomGuy

采纳答案by Justin Peel

回答by Siddhant

回答by JakeM

相关推荐

最近更新

标签

Python 扩展（添加行或列）一个 scipy.sparse 矩阵

提问by RandomGuy

采纳答案by Justin Peel

回答by Siddhant

回答by JakeM

相关推荐

在 Python 中使用 sys.stdout.write 嵌入变量

Python 网页抓取 - 如何识别网页上的主要内容

Python PyAudio IOError：没有可用的默认输入设备

python httplib 名称或服务未知

相关推荐

最近更新

标签