Python 根据条件获取数据框行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17322109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 07:53:27  来源:igfitidea点击:

get dataframe row count based on conditions

pythonpandas

提问by Nilani Algiriyage

I want to get the count of dataframe rows based on conditional selection. I tried the following code.

我想根据条件选择获取数据帧行数。我尝试了以下代码。

print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()

output:

输出:

IP          57
Time        57
Method      57
Resource    57
Status      57
Bytes       57
Referrer    57
Agent       57
dtype: int64

The output shows the count for each an every column in the dataframe. Instead I need to get a single count where all of the above conditions satisfied? How to do this? If you need more explanation about my dataframe please let me know.

输出显示数据帧中每一列的计数。相反,我需要在满足上述所有条件的情况下进行一次计数?这该怎么做?如果您需要有关我的数据框的更多解释,请告诉我。

采纳答案by Jeff

You are asking for the condition where all the conditions are true, so len of the frame is the answer, unless I misunderstand what you are asking

你要求的条件是所有条件都为真,所以框架的 len 就是答案,除非我误解了你在问什么

In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))

In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]: 
           A         B         C         D
12  0.491683  0.137766  0.859753 -1.041487
13  0.376200  0.575667  1.534179  1.247358
14  0.428739  1.539973  1.057848 -1.254489

In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]: 
A    3
B    3
C    3
D    3
dtype: int64

In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3

回答by Enias Cailliau

For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:

为了提高性能,您不应使用谓词评估数据帧。您可以直接使用谓词的结果,如下所示:

In [1]: import pandas as pd
        import numpy as np
        df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))


In [2]: df.head()
Out[2]:
          A         B         C         D
0 -2.019868  1.227246 -0.489257  0.149053
1  0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956  0.517803
3 -0.589489  0.400487  0.107856  0.194890
4  1.309088 -0.596996 -0.623519  0.020400

In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 μs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4

In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 μs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4

Keep in mind that this technique only works for counting the number of rows that comply with your predicate.

请记住,此技术仅适用于计算符合谓词的行数。