在python中计算数据帧的每一列中的非零值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26053849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:58:10  来源:igfitidea点击:

Counting non zero values in each column of a dataframe in python

pythonpandasdataframe

提问by Harsh Singal

I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:

我有一个 python-pandas-dataframe,其中第一列是 user_id,其余列是标签(tag_0 到 tag_122)。我有以下格式的数据:

UserId  Tag_0   Tag_1
7867688 0   5
7867688 0   3
7867688 3   0
7867688 3.5 3.5
7867688 4   4
7867688 3.5 0

My aim is to achieve Sum(Tag)/Count(NonZero(Tags))for each user_id

我的目标是Sum(Tag)/Count(NonZero(Tags))为每个 user_id实现

df.groupby('user_id').sum(), gives me sum(tag), however I am clueless about counting non zero values

df.groupby('user_id').sum(), 给了我sum(tag),但是我对计算非零值一无所知

Is it possible to achieve Sum(Tag)/Count(NonZero(Tags))in one command?

是否可以Sum(Tag)/Count(NonZero(Tags))在一个命令中实现?

In MySQL I could achieve this as follows:-

在 MySQL 中,我可以按如下方式实现:-

select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1

Any help shall be appreciated.

任何帮助将不胜感激。

回答by BrenBarn

To count nonzero values, just do (column!=0).sum(), where columnis the data you want to do it for. column != 0returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.

要计算非零值,只需执行(column!=0).sum()column您要对其执行的数据在哪里。 column != 0返回一个布尔数组,True 为 1,False 为 0,因此求和得出与条件匹配的元素数。

So to get your desired result, do

所以为了得到你想要的结果,做

df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())

回答by The Unfun Cat

My favorite way of getting number of nonzeros in each column is

我最喜欢的获取每列中非零值的方法是

df.astype(bool).sum(axis=0)

For the number of non-zeros in each row use

对于每行中的非零数使用

df.astype(bool).sum(axis=1)

(Thanks to Skulas)

(感谢斯库拉斯)

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

如果您的 df 中有 nans,您应该先将这些设为 0,否则它们将被计为 1。

df.fillna(0).astype(bool).sum(axis=1)

(Thanks to SirC)

(感谢 SirC)

回答by Sarah

Why not use np.count_nonzero?

为什么不使用np.count_nonzero

  1. To count the number of non-zeros of an entire dataframe, np.count_nonzero(df)
  2. To count the number of non-zeros of all rows np.count_nonzero(df, axis=0)
  3. To count the number of non-zeros of all columns np.count_nonzero(df, axis=1)
  1. 要计算整个数据帧的非零数, np.count_nonzero(df)
  2. 计算所有行的非零数 np.count_nonzero(df, axis=0)
  3. 计算所有列的非零数 np.count_nonzero(df, axis=1)

It works with dates too.

它也适用于日期。