在 Pandas 中选择不包含特定字符的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41754313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:49:36  来源:igfitidea点击:

Select rows in Pandas which does not contain a specific character

pythonpandas

提问by Arnold Klein

I need something similar to

我需要类似的东西

.str.startswith() 
.str.endswith()

but for the middle part of a string.

但是对于字符串的中间部分。

For example, given the following pd.DataFrame

例如,给定以下 pd.DataFrame

      str_name
   0    aaabaa
   1    aabbcb
   2    baabba
   3    aacbba
   4    baccaa
   5    ababaa

I need to throw rows 1, 3 and 4 which contain (at least one) letter 'c'.
The position of the specific letter ('c') is not known.
The task is to remove all rows which do contain at least one specific letter

我需要抛出包含(至少一个)字母“c”的第 1、3 和 4 行。
特定字母 ('c') 的位置未知。
任务是删除所有包含至少一个特定字母的行

回答by juanpa.arrivillaga

You want df['string_column'].str.contains('c')

你要 df['string_column'].str.contains('c')

>>> df
  str_name
0   aaabaa
1   aabbcb
2   baabba
3   aacbba
4   baccaa
5   ababaa
>>> df['str_name'].str.contains('c')
0    False
1     True
2    False
3     True
4     True
5    False
Name: str_name, dtype: bool

Now, you can "delete" like this

现在,您可以像这样“删除”

>>> df = df[~df['str_name'].str.contains('c')]
>>> df
  str_name
0   aaabaa
2   baabba
5   ababaa
>>>

Edited to add:

编辑添加:

If you only want to check the first kcharacters, you can slice. Suppose k=3:

如果您只想检查第一个k字符,则可以slice. 假设k=3

>>> df.str_name.str.slice(0,3)
0    aaa
1    aab
2    baa
3    aac
4    bac
5    aba
Name: str_name, dtype: object
>>> df.str_name.str.slice(0,3).str.contains('c')
0    False
1    False
2    False
3     True
4     True
5    False
Name: str_name, dtype: bool

Note, Series.str.slicedoes not behave like a typical Python slice.

注意,Series.str.slice它的行为不像典型的 Python 切片。

回答by piRSquared

you can use numpy

您可以使用 numpy

df[np.core.chararray.find(df.str_name.values.astype(str), 'c') < 0]

  str_name
0   aaabaa
2   baabba
5   ababaa

回答by Vaishali

You can use str.contains()

您可以使用 str.contains()

str_name = pd.Series(['aaabaa', 'aabbcb', 'baabba', 'aacbba',  'baccaa','ababaa'])
str_name.str.contains('c')

This will return the boolean

这将返回布尔值

The following will return the inverse of the above

以下将返回上述的倒数

~str_name.str.contains('c')