Python 计算二维 NumPy 数组的每一行和每一列内的非零元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3797158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:47:11  来源:igfitidea点击:

Counting non-zero elements within each row and within each column of a 2D NumPy array

pythonarrayscountnumpy

提问by MedicalMath

I have a NumPymatrix that contains mostly non-zero values, but occasionally will contain a zero value. I need to be able to:

我有一个NumPy主要包含非零值的矩阵,但偶尔会包含零值。我需要能够:

  1. Count the non-zero values in each row and put that count into a variable that I can use in subsequent operations, perhaps by iterating through row indices and performing the calculations during the iterative process.

  2. Count the non-zero values in each column and put that count into a variable that I can use in subsequent operations, perhaps by iterating through column indices and performing the calculations during the iterative process.

  1. 计算每一行中的非零值,并将该计数放入一个我可以在后续操作中使用的变量中,可能是通过迭代行索引并在迭代过程中执行计算。

  2. 计算每列中的非零值,并将该计数放入我可以在后续操作中使用的变量中,也许通过迭代列索引并在迭代过程中执行计算。

For example, one thing I need to do is to sum each row and then divide each row sum by the number of non-zero values in each row, reporting a separate result for each row index. And then I need to sum each column and then divide the column sum by the number of non-zero values in the column, also reporting a separate result for each column index. I need to do other things as well, but they should be easy after I figure out how to do the things that I am listing here.

例如,我需要做的一件事是对每一行求和,然后将每行总和除以每行中非零值的数量,为每个行索引报告一个单独的结果。然后我需要对每一列求和,然后将列总和除以列中非零值的数量,同时为每个列索引报告一个单独的结果。我还需要做其他事情,但是在我弄清楚如何做我在这里列出的事情后,它们应该很容易。

The code I am working with is below. You can see that I am creating an array of zeros and then populating it from a csvfile. Some of the rows will contain values for all the columns, but other rows will still have some zeros remaining in some of the last columns, thus creating the problem described above.

我正在使用的代码如下。您可以看到我正在创建一个零数组,然后从csv文件中填充它。某些行将包含所有列的值,但其他行的最后几列中仍会保留一些零,从而产生上述问题。

The last five lines of the code below are from another posting on this forum. These last five lines of code return a printed list of row/column indices for the zeros. However, I do not know how to use that resulting information to create the non-zero row counts and non-zero column counts described above.

下面代码的最后五行来自本论坛上的另一个帖子。最后五行代码返回零的行/列索引的打印列表。但是,我不知道如何使用该结果信息来创建上述非零行计数和非零列计数。

ANOVAInputMatrixValuesArray=zeros([len(TestIDs),9],float)
j=0
for j in range(0,len(TestIDs)):
    TestID=str(TestIDs[j])
    ReadOrWrite='Read'
    fileName=inputFileName
    directory=GetCurrentDirectory(arguments that return correct directory)
    inputfile=open(directory,'r')
    reader=csv.reader(inputfile)
    m=0
    for row in reader:
        if m<9:
            if row[0]!='TestID':
                ANOVAInputMatrixValuesArray[(j-1),m]=row[2]
                m+=1
    inputfile.close()

IndicesOfZeros = indices(ANOVAInputMatrixValuesArray.shape) 
locs = IndicesOfZeros[:,ANOVAInputMatrixValuesArray == 0]
pts = hsplit(locs, len(locs[0]))
for pt in pts:
    print(', '.join(str(p[0]) for p in pt))

Can anyone help me with this?

谁能帮我这个?

回答by eumiro

import numpy as np

a = np.array([[1, 0, 1],
              [2, 3, 4],
              [0, 0, 7]])

columns = (a != 0).sum(0)
rows    = (a != 0).sum(1)

The variable (a != 0)is an array of the same shape as original aand it contains Truefor all non-zero elements.

该变量(a != 0)是一个与原始形状相同的数组,a它包含True所有非零元素。

The .sum(x)function sums the elements over the axis x. Sum of True/Falseelements is the number of Trueelements.

.sum(x)函数对轴上的元素求和xTrue/False元素的总和是元素的数量True

The variables columnsand rowscontain the number of non-zero (element != 0) values in each column/row of your original array:

变量columnsrows包含原始数组的每一列/行中非零(元素!= 0)值的数量:

columns = np.array([2, 1, 3])
rows    = np.array([2, 3, 1])

EDIT: The whole code could look like this (with a few simplifications in your original code):

编辑:整个代码可能如下所示(在原始代码中有一些简化):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)
for j, TestID in enumerate(TestIDs):
    ReadOrWrite = 'Read'
    fileName = inputFileName
    directory = GetCurrentDirectory(arguments that return correct directory)
    # use directory or filename to get the CSV file?
    with open(directory, 'r') as csvfile:
        ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]

nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)
nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

EDIT 2:

编辑 2

To get the mean value of all columns/rows, use the following:

要获得所有列/行的平均值,请使用以下命令:

colMean = a.sum(0) / (a != 0).sum(0)
rowMean = a.sum(1) / (a != 0).sum(1)

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.

如果列/行中没有非零元素,您想做什么?然后我们可以修改代码来解决这样的问题。

回答by Finn ?rup Nielsen

(a != 0) does not work for sparse matrices (scipy.sparse.lil_matrix) in my present version of scipy.

(a != 0) 不适用于我当前版本的 scipy 中的稀疏矩阵 (scipy.sparse.lil_matrix)。

For sparse matrices I did:

对于稀疏矩阵,我做了:

    (i,j) = X.nonzero()
    column_sums = np.zeros(X.shape[1])
    for n in np.asarray(j).ravel():
        column_sums[n] += 1.

I wonder if there is a more elegant way.

我想知道是否有更优雅的方式。

回答by Marat Zaynutdinoff

The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:

更快的方法是用一个而不是实际值来克隆你的矩阵。然后只需按行或列总结:

X_clone = X.tocsc()
X_clone.data = np.ones( X_clone.data.shape )
NumNonZeroElementsByColumn = X_clone.sum(0)
NumNonZeroElementsByRow = X_clone.sum(1)

That worked 50 times faster for me than Finn ?rup Nielsen's solution (1 second against 53)

这对我来说比 Finn ?rup Nielsen 的解决方案快 50 倍(1 秒对 53)

edit: Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by

编辑:也许您需要将 NumNonZeroElementsByColumn 转换为一维数组

np.array(NumNonZeroElementsByColumn)[0]

回答by joeln

A fast way to count nonzero elements per row in a scipy sparse matrix mis:

在 scipy 稀疏矩阵中计算每行非零元素的一种快速方法m是:

np.diff(m.tocsr().indptr)

The indptrattribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.

indptrCSR矩阵的属性表示数据中与行之间的边界相对应的索引。因此计算每个条目之间的差异将提供每行中非零元素的数量。

Similarly, for the number of nonzero elements in each column, use:

同样,对于每列中非零元素的数量,使用:

np.diff(m.tocsc().indptr)

If the data is already in the appropriate form, these will run in O(m.shape[0])and O(m.shape[1])respectively, rather than O(m.getnnz())in Marat and Finn's solutions.

如果数据已经是适当的形式,它们将分别在O( m.shape[0])O( m.shape[1]) 中运行,而不是Marat 和 Finn 的解决方案中的O( m.getnnz())

If you need both row and column nozero counts, and, say, mis already a CSR, you might use:

如果您需要行和列 nozero 计数,并且m已经是 CSR,您可以使用:

row_nonzeros = np.diff(m.indptr)
col_nonzeros = np.bincount(m.indices)

which is not asymptotically faster than first converting to CSC (which is O(m.getnnz())) to get col_nonzeros, but is faster because of implementation details.

这并不比首先转换为 CSC(即O( m.getnnz()))来获得col_nonzeros快,但由于实现细节的原因更快。

回答by sandyp

For sparse matrices, use the getnnz()function supported by CSR/CSC matrix.

对于稀疏矩阵,使用getnnz()CSR/CSC 矩阵支持的函数。

E.g.

例如

a = scipy.sparse.csr_matrix([[0, 1, 1], [0, 1, 0]])
a.getnnz(axis=0)

array([0, 2, 1])