在 Pandas 中选择不包含特定字符的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41754313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select rows in Pandas which does not contain a specific character
提问by Arnold Klein
I need something similar to
我需要类似的东西
.str.startswith()
.str.endswith()
but for the middle part of a string.
但是对于字符串的中间部分。
For example, given the following pd.DataFrame
例如,给定以下 pd.DataFrame
str_name
0 aaabaa
1 aabbcb
2 baabba
3 aacbba
4 baccaa
5 ababaa
I need to throw rows 1, 3 and 4 which contain (at least one) letter 'c'.
The position of the specific letter ('c') is not known.
The task is to remove all rows which do contain at least one specific letter
我需要抛出包含(至少一个)字母“c”的第 1、3 和 4 行。
特定字母 ('c') 的位置未知。
任务是删除所有包含至少一个特定字母的行
回答by juanpa.arrivillaga
You want df['string_column'].str.contains('c')
你要 df['string_column'].str.contains('c')
>>> df
str_name
0 aaabaa
1 aabbcb
2 baabba
3 aacbba
4 baccaa
5 ababaa
>>> df['str_name'].str.contains('c')
0 False
1 True
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool
Now, you can "delete" like this
现在,您可以像这样“删除”
>>> df = df[~df['str_name'].str.contains('c')]
>>> df
str_name
0 aaabaa
2 baabba
5 ababaa
>>>
Edited to add:
编辑添加:
If you only want to check the first k
characters, you can slice
. Suppose k=3
:
如果您只想检查第一个k
字符,则可以slice
. 假设k=3
:
>>> df.str_name.str.slice(0,3)
0 aaa
1 aab
2 baa
3 aac
4 bac
5 aba
Name: str_name, dtype: object
>>> df.str_name.str.slice(0,3).str.contains('c')
0 False
1 False
2 False
3 True
4 True
5 False
Name: str_name, dtype: bool
Note, Series.str.slice
does not behave like a typical Python slice.
注意,Series.str.slice
它的行为不像典型的 Python 切片。
回答by piRSquared
you can use numpy
您可以使用 numpy
df[np.core.chararray.find(df.str_name.values.astype(str), 'c') < 0]
str_name
0 aaabaa
2 baabba
5 ababaa
回答by Vaishali
You can use str.contains()
您可以使用 str.contains()
str_name = pd.Series(['aaabaa', 'aabbcb', 'baabba', 'aacbba', 'baccaa','ababaa'])
str_name.str.contains('c')
This will return the boolean
这将返回布尔值
The following will return the inverse of the above
以下将返回上述的倒数
~str_name.str.contains('c')