pandas 当行包含特定文本时计算行数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31583151/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count number of rows when row contains certain text
提问by F1990
Probably a simple question but I could not find a simple answer. Let's for example take the following column Status within a dataframe df1:
可能是一个简单的问题,但我找不到简单的答案。例如,让我们以数据帧 df1 中的以下列状态为例:
**Status**
Planned
Unplanned
Missing
Corrected
I would like to count the rows when a cell contains, Planned and Missing. I tried the following:
我想计算单元格包含计划和缺失时的行数。我尝试了以下方法:
test1 = df1['Status'].str.contains('Planned|Missing').value_counts()
The column Status is from the type: object. What's wrong with my line of code?
状态列来自类型:对象。我的代码行有什么问题?
回答by EdChum
You can just filter the df with your boolean condition and then call len:
您可以使用布尔条件过滤 df 然后调用len:
In [155]:
len(df[df['Status'].str.contains('Planned|Missing')])
Out[155]:
2
Or use the index Truefrom your value_counts:
或者使用True您的索引value_counts:
In [158]:
df['Status'].str.contains('Planned|Missing').value_counts()[True]
Out[158]:
2
回答by Scotty
Give a try to the following one:
试试下面的方法:
df["Status"].value_counts()[['Planned','Missing']].sum()
回答by jpp
pd.Series.str.containswhen coupled with na=Falseguarantees you have a Boolean series. Note also True/ Falseact like 1/ 0with numeric computations. You can now use pd.Series.sumdirectly:
pd.Series.str.contains再加上na=False保证你有一个布尔系列。还要注意True/False像1/0与数字计算一样。您现在可以pd.Series.sum直接使用:
count = df['Status'].str.contains('Planned|Missing', na=False).sum()
This avoids unnecessary and expensive dataframe indexing operations.
这避免了不必要和昂贵的数据帧索引操作。

