python 如何有效地从稀疏矩阵中删除一列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2368544/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:28:36  来源:igfitidea点击:

How can I remove a column from a sparse matrix efficiently?

pythonmatrixnumpyscipyalgebra

提问by Brandon Pelfrey

If I am using the sparse.lil_matrix format, how can I remove a column from the matrix easily and efficiently?

如果我使用 sparse.lil_matrix 格式,如何轻松有效地从矩阵中删除一列?

采纳答案by Justin Peel

I've been wanting this myself and in truth there isn't a great built-in way to do it yet. Here's a way to do it. I chose to make a subclass of lil_matrix and add the remove_col function. If you want, you can instead add the removecol function to the lil_matrix class in your lib/site-packages/scipy/sparse/lil.pyfile. Here's the code:

我自己一直想要这个,实际上还没有一个很好的内置方法来做到这一点。这是一种方法。我选择创建lil_matrix 的子类并添加remove_col 函数。如果需要,您可以改为将 removecol 函数添加到lib/site-packages/scipy/sparse/lil.py文件中的 lil_matrix 类。这是代码:

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError('column index out of bounds')

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

I have tried it out and don't see any bugs. I certainly think that it is better than slicing the column out, which just creates a new matrix as far as I know.

我已经试过了,没有看到任何错误。我当然认为这比切出列要好,据我所知,这只会创建一个新矩阵。

I decided to make a removerow function as well, but I don't think that it is as good as removecol. I'm limited by not being able to remove one row from an ndarray in the way that I would like. Here is removerow which can be added to the above class

我也决定做一个removerow函数,但我认为它不如removecol。我无法按照我想要的方式从 ndarray 中删除一行。这是可以添加到上述类的 removerow

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError('row index out of bounds')

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

Perhaps I should submit these functions to the Scipy repository.

也许我应该将这些函数提交给 Scipy 存储库。

回答by Newmu

Much simpler and faster. You might not even need the conversion to csr, but I just know for sure that it works with csr sparse matrices and converting between shouldn't be an issue.

更简单和更快。您甚至可能不需要转换为 csr,但我确定它适用于 csr 稀疏矩阵,并且在两者之间转换应该不是问题。

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])

回答by JRun

For a sparse csr matrix (X) and a list of indices to drop (index_to_drop):

对于稀疏 csr 矩阵 (X) 和要删除的索引列表 (index_to_drop):

to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))    
new_X = X[:,to_keep]

It is easy to convert lil_matrices to csr_matrices. Check tocsr() in lil_matrix documentation

将 lil_matrices 转换为 csr_matrices 很容易。检查lil_matrix 文档中的tocsr()

Note however that going from csr to lil matrices using tolil() is expensive. So, this choice is good when you do not require to have your matrix in lil format.

但是请注意,使用 tolil() 从 csr 到 lil 矩阵是昂贵的。因此,当您不需要将矩阵设为 lil 格式时,此选择是不错的选择。

回答by nobody

I'm new to python so my answer is probably wrong, but I was wondering why something like the following won't be efficient?

我是 python 的新手,所以我的答案可能是错误的,但我想知道为什么像下面这样的东西效率不高?

Lets say your lil_matrix is called mat and that you want to remove the i-th column:

假设您的 lil_matrix 被称为 mat 并且您想删除第 i 列:

mat=hstack( [ mat[:,0:i] , mat[:,i+1:] ] )

Now the matrix will turn to a coo_matrix after that but you can turn it back to lil_matrix.

现在矩阵将在此之后变为 coo_matrix,但您可以将其返回为 lil_matrix。

Ok, I understand that this will have to create the two matrices inside the hstack before it does the assignment to the mat variable so it would be like having the original matrix plus one more at the same time but I guess if the sparsity is big enough then I think there shouldn't be any memory problems (since memory (and time) is the whole reason of using sparse matrices).

好的,我知道这将必须在 hstack 内部创建两个矩阵,然后才能分配给 mat 变量,因此就像同时将原始矩阵加一个一样,但我想如果稀疏性足够大那么我认为不应该有任何内存问题(因为内存(和时间)是使用稀疏矩阵的全部原因)。

回答by nobody

By looking at the notes for each sparse matrix, specifically in our case is csc matrix it has the following advantages as listed in the documentation [1]

通过查看每个稀疏矩阵的注释,特别是在我们的例子中是 csc 矩阵,它具有文档中列出的以下优点[1]

  • efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
  • efficient column slicing
  • fast matrix vector products (CSR, BSR may be faster)
  • 高效的算术运算 CSC + CSC、CSC * CSC 等。
  • 高效的列切片
  • 快速矩阵向量乘积(CSR、BSR可能更快)

If you have the column indices you want to remove, just use slicing. For removing rows use csr matrix since it is efficient in row slicing

如果您有要删除的列索引,只需使用切片。删除行使用 csr 矩阵,因为它在行切片中很有效

回答by Micha? Meina


def removecols(W, col_list):
        if min(col_list) = W.shape[1]:
                raise IndexError('column index out of bounds')
        rows = W.rows
        data = W.data
        for i in xrange(M.shape[0]):
            for j in col_list:
                pos = bisect_left(rows[i], j)
                if pos == len(rows[i]):
                        continue
                elif rows[i][pos] == j:
                        rows[i].pop(pos)
                        data[i].pop(pos)
                        if pos == len(rows[i]):
                                continue
                for pos2 in xrange(pos,len(rows[i])):
                        rows[i][pos2] -= 1
        W._shape = (W._shape[0], W._shape[1]-len(col_list))
        return W

Just rewrote your code to work with col_list as input - maybe this will be helpful for somebody.

只需重写您的代码以使用 col_list 作为输入 - 也许这对某人有帮助。