在不删除行的情况下过滤 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19507088/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering a Pandas DataFrame Without Removing Rows
提问by mclark1129
I'm trying to use whereon my Pandas DataFrame in replace all cells that don't meet my criteria with NaN. Howevever, I'd like to do it in such a way that will always preserve the shape of my original DataFrame, and not remove any rows from the resulting DataFrame.
我正在尝试where在我的 Pandas DataFrame 上使用NaN. 但是,我希望以始终保留原始 DataFrame 形状的方式执行此操作,并且不会从生成的 DataFrame 中删除任何行。
Given the following DataFrame:
给定以下数据帧:
A B C D
1/1 0 1 0 1
1/2 2 1 1 1
1/3 3 0 1 0
1/4 1 0 1 2
1/5 1 0 1 1
1/6 2 0 2 1
1/7 3 5 2 3
I would like to search the dataframe for all cells that meet a certain criteria, when column DALSO meets a particular criteria. In this case my criteria is:
当列DALSO 满足特定条件时,我想在数据框中搜索满足特定条件的所有单元格。在这种情况下,我的标准是:
Find all cells that are greater than the previous value, when column D is also > 1
当 D 列也 > 1 时,查找所有大于前一个值的单元格
I accomplish this by using the following syntax:
我通过使用以下语法来完成此操作:
matches = df[df > df.shift(1))]
matches = matches[df.D > 1]
I have to split this query into two statements because of the fact that df.Dis a Series and does not match the shape of the entire DataFrame. According to this questionI asked previously, support for a broadcasting &operator will not be available until 0.14.
我必须将此查询拆分为两个语句,因为它df.D是一个系列并且与整个 DataFrame 的形状不匹配。根据我之前问过的这个问题,&直到 0.14 才会支持广播运营商。
The problem I am having is that it seems like after I run the second statement, the shape of the resulting data frame is changed and rows have been removed. The number of columns stays the same. The first statement leaves the original number of rows.
我遇到的问题是,在我运行第二个语句之后,结果数据框的形状似乎发生了变化,并且行已被删除。列数保持不变。第一条语句保留原始行数。
Why would the second statement remove rows while the first does not? How could I achieve the same result, but leaving the full number of rows in tact?
为什么第二个语句会删除行而第一个不会?我怎样才能获得相同的结果,但又保持完整的行数?
Edit:
编辑:
The pandas documentation states that in order to guarantee that the shape is preserved, I should use the wheremethod over boolean indexing. However, that does not seem to be allowed to perform my second statement, so:
pandas 文档指出,为了保证形状被保留,我应该使用where布尔索引的方法。但是,这似乎不允许执行我的第二个语句,因此:
matches.where(df.D > 1)
Gives me the following error:
给了我以下错误:
ValueError: Array conditional must be same shape as self
ValueError:数组条件必须与自我形状相同
回答by Jeff
This is slightly more intuitive than @DSM answer (but pandas missing this type of auto-broadcasting on boolean ops ATM)
这比@DSM 答案更直观(但Pandas在布尔操作 ATM 上缺少这种类型的自动广播)
In [58]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values)
Out[58]:
A B C D
1/1 NaN NaN NaN NaN
1/2 2 NaN 1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6 2 NaN 2 NaN
1/7 NaN NaN NaN NaN
see herefor the issue to be addressed in 0.14
请参阅此处了解要在 0.14 中解决的问题
回答by DSM
If I understand what you're after, you can do the broadcasting manually by dropping down to the numpylevel:
如果我了解您的要求,您可以通过下拉到numpy级别来手动进行广播:
>>> (df > df.shift(1)).values & (df.D == 1)[:,None]
array([[False, False, False, False],
[ True, False, True, False],
[False, False, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, True, False],
[False, False, False, False]], dtype=bool)
after which you can use where:
之后你可以使用where:
>>> df.where((df > df.shift(1)).values & (df.D == 1)[:,None], np.nan)
A B C D
1/1 NaN NaN NaN NaN
1/2 2 NaN 1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6 2 NaN 2 NaN
1/7 NaN NaN NaN NaN

