pandas 当行包含特定文本时计算行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31583151/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:39:59  来源:igfitidea点击:

Count number of rows when row contains certain text

pythonpandas

提问by F1990

Probably a simple question but I could not find a simple answer. Let's for example take the following column Status within a dataframe df1:

可能是一个简单的问题,但我找不到简单的答案。例如,让我们以数据帧 df1 中的以下列状态为例:

**Status**
Planned
Unplanned
Missing
Corrected

I would like to count the rows when a cell contains, Planned and Missing. I tried the following:

我想计算单元格包含计划和缺失时的行数。我尝试了以下方法:

test1 = df1['Status'].str.contains('Planned|Missing').value_counts()

The column Status is from the type: object. What's wrong with my line of code?

状态列来自类型:对象。我的代码行有什么问题?

回答by EdChum

You can just filter the df with your boolean condition and then call len:

您可以使用布尔条件过滤 df 然后调用len

In [155]:
len(df[df['Status'].str.contains('Planned|Missing')])

Out[155]:
2

Or use the index Truefrom your value_counts:

或者使用True您的索引value_counts

In [158]:   
df['Status'].str.contains('Planned|Missing').value_counts()[True]

Out[158]:
2

回答by Scotty

Give a try to the following one:

试试下面的方法:

df["Status"].value_counts()[['Planned','Missing']].sum()

回答by jpp

pd.Series.str.containswhen coupled with na=Falseguarantees you have a Boolean series. Note also True/ Falseact like 1/ 0with numeric computations. You can now use pd.Series.sumdirectly:

pd.Series.str.contains再加上na=False保证你有一个布尔系列。还要注意True/False1/0与数字计算一样。您现在可以pd.Series.sum直接使用:

count = df['Status'].str.contains('Planned|Missing', na=False).sum()

This avoids unnecessary and expensive dataframe indexing operations.

这避免了不必要和昂贵的数据帧索引操作。