Python Pandas 计算每列中小于 x 的元素数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23833763/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:29:37  来源:igfitidea点击:

Pandas count number of elements in each column less than x

pythonmatplotlibpandas

提问by marillion

I have a DataFrame which looks like below. I am trying to count the number of elements less than 2.0 in each column, then I will visualize the result in a bar plot. I did it using lists and loops, but I wonder if there is a "Pandas way" to do this quickly. Thanks!

我有一个如下所示的 DataFrame。我试图计算每列中小于 2.0 的元素数,然后我将在条形图中可视化结果。我使用列表和循环来完成它,但我想知道是否有一种“熊猫方式”可以快速做到这一点。谢谢!

x = []
for i in range(6):
    x.append(df[df.ix[:,i]<2.0].count()[i])

then I can get a bar plot using list x.

然后我可以使用 list 获得条形图x

          A          B          C          D          E          F 
0       2.142      1.929      1.674      1.547      3.395      2.382  
1       2.077      1.871      1.614      1.491      3.110      2.288  
2       2.098      1.889      1.610      1.487      3.020      2.262    
3       1.990      1.760      1.479      1.366      2.496      2.128    
4       1.935      1.765      1.656      1.530      2.786      2.433

采纳答案by EdChum

In [96]:

df = pd.DataFrame({'a':randn(10), 'b':randn(10), 'c':randn(10)})
df
Out[96]:
          a         b         c
0 -0.849903  0.944912  1.285790
1 -1.038706  1.445381  0.251002
2  0.683135 -0.539052 -0.622439
3 -1.224699 -0.358541  1.361618
4 -0.087021  0.041524  0.151286
5 -0.114031 -0.201018 -0.030050
6  0.001891  1.601687 -0.040442
7  0.024954 -1.839793  0.917328
8 -1.480281  0.079342 -0.405370
9  0.167295 -1.723555 -0.033937

[10 rows x 3 columns]
In [97]:

df[df > 1.0].count()

Out[97]:
a    0
b    2
c    2
dtype: int64

So in your case:

所以在你的情况下:

df[df < 2.0 ].count() 

should work

应该管用

EDIT

编辑

some timings

一些时间

In [3]:

%timeit df[df < 1.0 ].count() 
%timeit (df < 1.0).sum()
%timeit (df < 1.0).apply(np.count_nonzero)
1000 loops, best of 3: 1.47 ms per loop
1000 loops, best of 3: 560 us per loop
1000 loops, best of 3: 529 us per loop

So @DSM's suggestions are correct and much faster than my suggestion

所以@DSM 的建议是正确的,而且比我的建议快得多