pandas 熊猫删除行与过滤器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50689823/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas drop rows vs filter
提问by ojon
I have a pandas dataframe and want to get rid of rows in which the column 'A' is negative. I know 2 ways to do this:
我有一个 Pandas 数据框,想去掉“A”列为负的行。我知道有两种方法可以做到这一点:
df = df[df['A'] >= 0]
or
或者
selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)
What is the recommended solution? Why?
推荐的解决方案是什么?为什么?
采纳答案by VaM
The recommended solution is the most eficient, which in this case, is the first one.
推荐的解决方案是最有效的,在这种情况下,它是第一个。
df = df[df['A'] >= 0]
On the second solution
关于第二种解决方案
selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)
you are repeating the slicing process. But lets break it to pieces to understand why.
您正在重复切片过程。但是让我们把它分解成碎片来理解为什么。
When you write
当你写
df['A'] >= 0
you are creating a mask, a Boolean Series with an entry for each index of df, whose value is either True or False according to a condition (on this case, if such the value of column 'A' at a given index is greater than or equal to 0).
您正在创建一个掩码,一个布尔系列,其中包含 df 的每个索引的条目,其值根据条件为 True 或 False(在这种情况下,如果给定索引处的列“A”的值大于或等于 0)。
When you write
当你写
df[df['A'] >= 0]
you accessing the rows for which your mask (df['A'] >= 0) is True. This is a slicing method supported by Pandas that lets you select certain rows by passing a Boolean Series and will return a new DataFrame with only the entries for which the Series was True.
您访问掩码 (df['A'] >= 0) 为 True 的行。这是 Pandas 支持的一种切片方法,它允许您通过传递布尔系列来选择某些行,并返回一个新的 DataFrame,其中仅包含系列为 True 的条目。
Finally, when you write this
最后,当你写这个
selRows = df[df['A'] < 0].index
df = df.drop(selRows, axis=0)
you are repeating the proccess because
你正在重复这个过程,因为
df[df['A'] < 0]
is already slicing your DataFrame (in this case for the rows you want to drop). You are then getting those indices, going back to the original DataFrame and explicitly dropping them. No need for this, you already sliced the DataFrame in the first step.
已经在切片您的 DataFrame (在这种情况下,您要删除的行)。然后您将获得这些索引,返回到原始 DataFrame 并明确删除它们。不需要这个,你已经在第一步中对 DataFrame 进行了切片。
回答by Alex
df = df[df['A'] >= 0]
is indeed the faster solution. Just be aware that it returns a viewof the original data frame, not a new data frame. This can lead you into trouble, for example when you want to change its values, as pandas will give you the SettingwithCopyWarning
.
确实是更快的解决方案。请注意,它返回原始数据框的视图,而不是新数据框。这可能会给您带来麻烦,例如,当您想更改其值时,Pandas会为您提供SettingwithCopyWarning
.
The simple fix of course is what Wen-Ben recommended:
简单的修复当然是文本推荐的:
df = df[df['A'] >= 0].copy()
回答by cs95
Your question is like this: "I have two identical cakes, but one has icing. Which has more calories?"
你的问题是这样的:“我有两个一模一样的蛋糕,但是一个有糖衣,哪个热量更高?”
The second solution is doing the same thing but twice. A filtering step is enough, there's no need to filter and thenredundantly proceed to call a function that does the exact same thing the filtering op from the previous step did.
第二种解决方案是做同样的事情,但两次。一个过滤步骤就足够了,不需要过滤然后多余地继续调用一个函数,该函数执行与上一步中的过滤操作完全相同的操作。
To clarify: regardless of the operation, you are still doing the same thing: generating a boolean mask, and then subsequently indexing.
澄清一下:不管操作如何,您仍然在做同样的事情:生成一个布尔掩码,然后进行索引。