如何计算 Pandas 数据帧组中索引或空值的数量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16562080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:49:23  来源:igfitidea点击:

How to count number of index or Null values in Pandas dataframe group

pandas

提问by user1911866

Its always the things that seem easy that bug me. I am trying to get a count of the number of non-null values of some variables in a Dataframe grouped by month and year. So I can do this which works fine

总是那些看起来很容易的事情让我烦恼。我正在尝试计算按月和年分组的 Dataframe 中某些变量的非空值的数量。所以我可以做这个工作正常

counts_by_month=df[variable1, variable2].groupby([lambda x: x.year,lambda x: x.month]).count()

But I REALLY want to know is how many of those values in each group are NaNs. So I want to count the Nans in each variable too so that I can calculate the percentage data missing in each group. I can not find a function to do this. or maybe I could get to the same end by counting the total items in the group. Then the NaNs would be Total - 'Non-Null values'

但我真的很想知道每组中有多少这些值是 NaN。所以我也想计算每个变量中的 Nans,以便我可以计算每个组中缺失的百分比数据。我找不到执行此操作的函数。或者也许我可以通过计算组中的总项目数来达到同样的目的。那么 NaN 将是 Total - 'Non-Null values'

I have been trying to find out if I can somehow count the index values but I haven't been able to do so. Any assistance on this greatly appreciated. Best wishes Jason

我一直试图找出我是否可以以某种方式计算索引值,但我一直无法这样做。对此的任何帮助都非常感谢。最好的祝福杰森

采纳答案by Wouter Overmeire

In [279]: df
Out[279]:
     A         B         C         D         E
a  foo       NaN  1.115320 -0.528363 -0.046242
b  bar  0.991114 -1.978048 -1.204268  0.676268
c  bar  0.293008 -0.708600       NaN -0.388203
d  foo  0.408837 -0.012573  1.019361  1.774965
e  foo  0.127372       NaN       NaN       NaN

In [280]: def count_missing(frame):
    return (frame.shape[0] * frame.shape[1]) - frame.count().sum()
   .....:

In [281]: df.groupby('A').apply(count_missing)
Out[281]:
A
bar    1
foo    4
dtype: int64

回答by GrimSqueaker

df.isnull().sum()

Faster, and doesn't need a custom function :)

更快,不需要自定义函数:)