pandas 熊猫删除行与过滤器

Question

提问by ojon

I have a pandas dataframe and want to get rid of rows in which the column 'A' is negative. I know 2 ways to do this:

我有一个 Pandas 数据框，想去掉“A”列为负的行。我知道有两种方法可以做到这一点：

df = df[df['A'] >= 0]

or

或者

selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)

What is the recommended solution? Why?

推荐的解决方案是什么？为什么？

Answer 1

采纳答案by VaM

The recommended solution is the most eficient, which in this case, is the first one.

推荐的解决方案是最有效的，在这种情况下，它是第一个。

df = df[df['A'] >= 0]

On the second solution

关于第二种解决方案

selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)

you are repeating the slicing process. But lets break it to pieces to understand why.

您正在重复切片过程。但是让我们把它分解成碎片来理解为什么。

When you write

当你写

df['A'] >= 0

you are creating a mask, a Boolean Series with an entry for each index of df, whose value is either True or False according to a condition (on this case, if such the value of column 'A' at a given index is greater than or equal to 0).

您正在创建一个掩码，一个布尔系列，其中包含 df 的每个索引的条目，其值根据条件为 True 或 False（在这种情况下，如果给定索引处的列“A”的值大于或等于 0)。

When you write

当你写

df[df['A'] >= 0]

you accessing the rows for which your mask (df['A'] >= 0) is True. This is a slicing method supported by Pandas that lets you select certain rows by passing a Boolean Series and will return a new DataFrame with only the entries for which the Series was True.

您访问掩码 (df['A'] >= 0) 为 True 的行。这是 Pandas 支持的一种切片方法，它允许您通过传递布尔系列来选择某些行，并返回一个新的 DataFrame，其中仅包含系列为 True 的条目。

Finally, when you write this

最后，当你写这个

selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)

you are repeating the proccess because

你正在重复这个过程，因为

df[df['A'] < 0]

is already slicing your DataFrame (in this case for the rows you want to drop). You are then getting those indices, going back to the original DataFrame and explicitly dropping them. No need for this, you already sliced the DataFrame in the first step.

已经在切片您的 DataFrame （在这种情况下，您要删除的行）。然后您将获得这些索引，返回到原始 DataFrame 并明确删除它们。不需要这个，你已经在第一步中对 DataFrame 进行了切片。

Answer 2

回答by Alex

df = df[df['A'] >= 0]

is indeed the faster solution. Just be aware that it returns a viewof the original data frame, not a new data frame. This can lead you into trouble, for example when you want to change its values, as pandas will give you the SettingwithCopyWarning.

确实是更快的解决方案。请注意，它返回原始数据框的视图，而不是新数据框。这可能会给您带来麻烦，例如，当您想更改其值时，Pandas会为您提供SettingwithCopyWarning.

The simple fix of course is what Wen-Ben recommended:

简单的修复当然是文本推荐的：

df = df[df['A'] >= 0].copy()

Answer 3

回答by cs95

Your question is like this: "I have two identical cakes, but one has icing. Which has more calories?"

你的问题是这样的：“我有两个一模一样的蛋糕，但是一个有糖衣，哪个热量更高？”

The second solution is doing the same thing but twice. A filtering step is enough, there's no need to filter and thenredundantly proceed to call a function that does the exact same thing the filtering op from the previous step did.

第二种解决方案是做同样的事情，但两次。一个过滤步骤就足够了，不需要过滤然后多余地继续调用一个函数，该函数执行与上一步中的过滤操作完全相同的操作。

To clarify: regardless of the operation, you are still doing the same thing: generating a boolean mask, and then subsequently indexing.

澄清一下：不管操作如何，您仍然在做同样的事情：生成一个布尔掩码，然后进行索引。

pandas 熊猫删除行与过滤器

提问by ojon

采纳答案by VaM

回答by Alex

回答by cs95

相关推荐

最近更新

标签

pandas 熊猫删除行与过滤器

提问by ojon

采纳答案by VaM

回答by Alex

回答by cs95

相关推荐

pandas 从 numpy 数组创建熊猫数据框

从 Pandas Column 解压字典

如何将函数应用于 Pandas 中的多列

使用 Pandas 在标签中的单个 csv 表中添加多个 csv

相关推荐

最近更新

标签