Python 根据条件获取数据框行数

Question

提问by Nilani Algiriyage

I want to get the count of dataframe rows based on conditional selection. I tried the following code.

我想根据条件选择获取数据帧行数。我尝试了以下代码。

print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()

output:

输出：

IP          57
Time        57
Method      57
Resource    57
Status      57
Bytes       57
Referrer    57
Agent       57
dtype: int64

The output shows the count for each an every column in the dataframe. Instead I need to get a single count where all of the above conditions satisfied? How to do this? If you need more explanation about my dataframe please let me know.

输出显示数据帧中每一列的计数。相反，我需要在满足上述所有条件的情况下进行一次计数？这该怎么做？如果您需要有关我的数据框的更多解释，请告诉我。

Answer 1

采纳答案by Jeff

You are asking for the condition where all the conditions are true, so len of the frame is the answer, unless I misunderstand what you are asking

你要求的条件是所有条件都为真，所以框架的 len 就是答案，除非我误解了你在问什么

In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))

In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]: 
           A         B         C         D
12  0.491683  0.137766  0.859753 -1.041487
13  0.376200  0.575667  1.534179  1.247358
14  0.428739  1.539973  1.057848 -1.254489

In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]: 
A    3
B    3
C    3
D    3
dtype: int64

In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3

Answer 2

回答by Enias Cailliau

For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:

为了提高性能，您不应使用谓词评估数据帧。您可以直接使用谓词的结果，如下所示：

In [1]: import pandas as pd
        import numpy as np
        df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))


In [2]: df.head()
Out[2]:
          A         B         C         D
0 -2.019868  1.227246 -0.489257  0.149053
1  0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956  0.517803
3 -0.589489  0.400487  0.107856  0.194890
4  1.309088 -0.596996 -0.623519  0.020400

In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 μs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4

In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 μs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4

Keep in mind that this technique only works for counting the number of rows that comply with your predicate.

请记住，此技术仅适用于计算符合谓词的行数。

Python 根据条件获取数据框行数

提问by Nilani Algiriyage

采纳答案by Jeff

回答by Enias Cailliau

相关推荐

最近更新

标签

Python 根据条件获取数据框行数

提问by Nilani Algiriyage

采纳答案by Jeff

回答by Enias Cailliau

相关推荐

如何在python中更新全局变量

opencv 3.0.0-dev python 绑定不能正常工作

Python：如何检查键是否存在并以降序从字典中检索值

Python 检查字符串是否在 CSV 中

相关推荐

最近更新

标签