pandas 计算熊猫中的非空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47044183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:43:15  来源:igfitidea点击:

Count non-null values in pandas

pythonpandas

提问by Petr Petrov

I have dataframe

我有数据框

    site1   time1   site2   time2   site3   time3   site4   time4   site5   time5   ... time6   site7   time7   site8   time8   site9   time9   site10  time10  target
 session_id                                                                                 

21669   56  2013-01-12 08:05:57 55.0    2013-01-12 08:05:57 NaN NaT NaN NaT NaN NaT ... NaT NaN NaT NaN NaT NaN NaT NaN NaT 0
54843   56  2013-01-12 08:37:23 55.0    2013-01-12 08:37:23 56.0    2013-01-12 09:07:07 55.0    2013-01-12 09:07:09 NaN NaT ... NaT NaN NaT NaN NaT NaN NaT NaN NaT 0
77292   946 2013-01-12 08:50:13 946.0   2013-01-12 08:50:14 951.0   2013-01-12 08:50:15 946.0   2013-01-12 08:50:15 946.0   2013-01-12 08:50:16 ... 2013-01-12 08:50:16 948.0   2013-01-12 08:50:16 784.0   2013-01-12 08:50:16 949.0   2013-01-12 08:50:17 946.0   2013-01-12 08:50:17 0
114021  945 2013-01-12 08:50:17 948.0   2013-01-12 08:50:17 949.0   2013-01-12 08:50:18 948.0   2013-01-12 08:50:18 945.0   2013-01-12 08:50:18 ... 2013-01-12 08:50:18 947.0   2013-01-12 08:50:19 945.0   2013-01-12 08:50:19 946.0   2013-01-12 08:50:19 946.0   2013-01-12 08:50:20 0

I need to count N of columns, where site != NaN. I try to use

我需要计算 N 列,其中 site != NaN。我尝试使用

df[['site%s' % i for i in range(1, 11)]].count(axis=1)

but it returns me 10 to every id

但它给我每个 id 返回 10

Also I have tried

我也试过

train_df[sites].notnull().count(axis=1)

and it also didn't help.

它也没有帮助。

Desire output

欲望输出

21669    2
54843    4
77292    10
114021   10

回答by cs95

I'd do this with just count:

我会这样做count

train_df[sites].count(axis=1)

countspecifically counts non-null values. The issue with your current implementation is that notnullyields boolean values, and bools are certainly not-null, meaning they are always counted.

count专门计算非空值。您当前实现的问题是notnull产生布尔值,而bools 肯定不是空的,这意味着它们总是被计算在内。



df

        one       two     three four   five
a -0.166778  0.501113 -0.355322  bar  False
b       NaN       NaN       NaN  NaN    NaN
c -0.337890  0.580967  0.983801  bar  False
d       NaN       NaN       NaN  NaN    NaN
e  0.057802  0.761948 -0.712964  bar   True
f -0.443160 -0.974602  1.047704  bar  False
g       NaN       NaN       NaN  NaN    NaN
h -0.717852 -1.053898 -0.019369  bar  False

df.count(axis=1)

a    5
b    0
c    5
d    0
e    5
f    5
g    0
h    5
dtype: int64

And...

和...

df.notnull().count(axis=1)


a    5
b    5
c    5
d    5
e    5
f    5
g    5
h    5
dtype: int64