Python 如何从包含特定列中的特定字符串的 Pandas 数据框中删除行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28679930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:36:37  来源:igfitidea点击:

How to drop rows from pandas data frame that contains a particular string in a particular column?

pythonpandas

提问by London guy

I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

我在 python 中有一个非常大的数据框,我想删除在特定列中具有特定字符串的所有行。

For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.

例如,我想删除所有将字符串“XYZ”作为数据框 C 列中的子字符串的行。

Can this be implemented in an efficient way using .drop() method?

这可以使用 .drop() 方法以有效的方式实现吗?

采纳答案by Brian from QuantRocket

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

pandas 具有矢量化字符串操作,因此您可以过滤掉包含您不想要的字符串的行:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat

回答by Kenan

If your string constraint is not just one string you can drop those corresponding rows with:

如果您的字符串约束不仅仅是一个字符串,您可以使用以下命令删除相应的行:

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list

以上将删除包含列表元素的所有行

回答by Rupert Schiessl

This will only work if you want to compare exact strings. It will not work in case you want to check if the column string contains any of the strings in the list.

这仅在您想比较精确字符串时才有效。如果您想检查列字符串是否包含列表中的任何字符串,它将不起作用。

The right way to compare with a list would be :

与列表进行比较的正确方法是:

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]

回答by Zhou Ruohua

if you do not want to delete all NaN, use

如果您不想删除所有 NaN,请使用

df[~df.C.str.contains("XYZ") == True]

回答by ak3191

The below code will give you list of all the rows:-

以下代码将为您提供所有行的列表:-

df[df['C'] != 'XYZ']

To store the values from the above code into a dataframe :-

要将上述代码中的值存储到数据帧中:-

newdf = df[df['C'] != 'XYZ']

回答by Devarshi Mandal

Slight modification to the code. Having na=Falsewill skip empty values. Otherwise you can get an error TypeError: bad operand type for unary ~: float

对代码稍作修改。有NA =假将跳过空值。否则你会得到一个错误TypeError: bad operand type for unary ~: float

df[~df.C.str.contains("XYZ", na=False)]

Source: TypeError: bad operand type for unary ~: float

来源:TypeError:一元的错误操作数类型〜:浮点数