Python 如何从包含特定列中的特定字符串的 Pandas 数据框中删除行？

Question

提问by London guy

I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

我在 python 中有一个非常大的数据框，我想删除在特定列中具有特定字符串的所有行。

For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.

例如，我想删除所有将字符串“XYZ”作为数据框 C 列中的子字符串的行。

Can this be implemented in an efficient way using .drop() method?

这可以使用 .drop() 方法以有效的方式实现吗？

Answer 1

采纳答案by Brian from QuantRocket

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

pandas 具有矢量化字符串操作，因此您可以过滤掉包含您不想要的字符串的行：

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat

Answer 2

回答by Kenan

If your string constraint is not just one string you can drop those corresponding rows with:

如果您的字符串约束不仅仅是一个字符串，您可以使用以下命令删除相应的行：

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list

以上将删除包含列表元素的所有行

Answer 3

回答by Rupert Schiessl

This will only work if you want to compare exact strings. It will not work in case you want to check if the column string contains any of the strings in the list.

这仅在您想比较精确字符串时才有效。如果您想检查列字符串是否包含列表中的任何字符串，它将不起作用。

The right way to compare with a list would be :

与列表进行比较的正确方法是：

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]

Answer 4

回答by Amy Annine

new_df = df[df.C != 'XYZ']

Reference: https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/

参考：https: //chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/

Answer 5

回答by Zhou Ruohua

if you do not want to delete all NaN, use

如果您不想删除所有 NaN，请使用

df[~df.C.str.contains("XYZ") == True]

Answer 6

回答by ak3191

The below code will give you list of all the rows:-

以下代码将为您提供所有行的列表：-

df[df['C'] != 'XYZ']

To store the values from the above code into a dataframe :-

要将上述代码中的值存储到数据帧中：-

newdf = df[df['C'] != 'XYZ']

Answer 7

回答by Devarshi Mandal

Slight modification to the code. Having na=Falsewill skip empty values. Otherwise you can get an error TypeError: bad operand type for unary ~: float

对代码稍作修改。有NA =假将跳过空值。否则你会得到一个错误TypeError: bad operand type for unary ~: float

df[~df.C.str.contains("XYZ", na=False)]

Source: TypeError: bad operand type for unary ~: float

来源：TypeError：一元的错误操作数类型〜：浮点数

Python 如何从包含特定列中的特定字符串的 Pandas 数据框中删除行？

提问by London guy

采纳答案by Brian from QuantRocket

回答by Kenan

回答by Rupert Schiessl

回答by Amy Annine

回答by Zhou Ruohua

回答by ak3191

回答by Devarshi Mandal

相关推荐

最近更新

标签

Python 如何从包含特定列中的特定字符串的 Pandas 数据框中删除行？

提问by London guy

采纳答案by Brian from QuantRocket

回答by Kenan

回答by Rupert Schiessl

回答by Amy Annine

回答by Zhou Ruohua

回答by ak3191

回答by Devarshi Mandal

相关推荐

Python 如何从熊猫数据框中绘制直方图

过滤 Python 字典中的项，其中键包含特定字符串

Python 熊猫将一些列转换为行

python 相当于 R 的 NA 是什么？

相关推荐

最近更新

标签