在python中计算数据帧的每一列中的非零值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26053849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting non zero values in each column of a dataframe in python
提问by Harsh Singal
I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:
我有一个 python-pandas-dataframe,其中第一列是 user_id,其余列是标签(tag_0 到 tag_122)。我有以下格式的数据:
UserId Tag_0 Tag_1
7867688 0 5
7867688 0 3
7867688 3 0
7867688 3.5 3.5
7867688 4 4
7867688 3.5 0
My aim is to achieve Sum(Tag)/Count(NonZero(Tags))for each user_id
我的目标是Sum(Tag)/Count(NonZero(Tags))为每个 user_id实现
df.groupby('user_id').sum(), gives me sum(tag), however I am clueless about counting non zero values
df.groupby('user_id').sum(), 给了我sum(tag),但是我对计算非零值一无所知
Is it possible to achieve Sum(Tag)/Count(NonZero(Tags))in one command?
是否可以Sum(Tag)/Count(NonZero(Tags))在一个命令中实现?
In MySQL I could achieve this as follows:-
在 MySQL 中,我可以按如下方式实现:-
select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1
Any help shall be appreciated.
任何帮助将不胜感激。
回答by BrenBarn
To count nonzero values, just do (column!=0).sum(), where columnis the data you want to do it for. column != 0returns a boolean array, and True is 1 and False is 0, so summing this gives you the number of elements that match the condition.
要计算非零值,只需执行(column!=0).sum(),column您要对其执行的数据在哪里。 column != 0返回一个布尔数组,True 为 1,False 为 0,因此求和得出与条件匹配的元素数。
So to get your desired result, do
所以为了得到你想要的结果,做
df.groupby('user_id').apply(lambda column: column.sum()/(column != 0).sum())
回答by The Unfun Cat
My favorite way of getting number of nonzeros in each column is
我最喜欢的获取每列中非零值的方法是
df.astype(bool).sum(axis=0)
For the number of non-zeros in each row use
对于每行中的非零数使用
df.astype(bool).sum(axis=1)
(Thanks to Skulas)
(感谢斯库拉斯)
If you have nans in your df you should make these zero first, otherwise they will be counted as 1.
如果您的 df 中有 nans,您应该先将这些设为 0,否则它们将被计为 1。
df.fillna(0).astype(bool).sum(axis=1)
(Thanks to SirC)
(感谢 SirC)
回答by Sarah
Why not use np.count_nonzero?
为什么不使用np.count_nonzero?
- To count the number of non-zeros of an entire dataframe,
np.count_nonzero(df) - To count the number of non-zeros of all rows
np.count_nonzero(df, axis=0) - To count the number of non-zeros of all columns
np.count_nonzero(df, axis=1)
- 要计算整个数据帧的非零数,
np.count_nonzero(df) - 计算所有行的非零数
np.count_nonzero(df, axis=0) - 计算所有列的非零数
np.count_nonzero(df, axis=1)
It works with dates too.
它也适用于日期。

