带有 lambda 函数的 Pandas .filter() 方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48304854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas .filter() method with lambda function
提问by confused_pup
I'm trying to understand the .filter()method in Pandas. I'm not sure why the below code doesn't work:
我试图理解Pandas 中的.filter()方法。我不确定为什么下面的代码不起作用:
# Load data
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Set arbitrary index (is this needed?) and try filtering:
indexed_df = df.copy().set_index('sepal width (cm)')
test = indexed_df.filter(lambda x: x['petal length (cm)'] > 1.4)
I get:
我得到:
TypeError: 'function' object is not iterable
I appreciate there are simpler ways to do this (e.g. Boolean indexing) but I'm trying to understand for learning purposes why filter
fails here when it works for a groupby
as shown below:
我很欣赏有更简单的方法来做到这一点(例如布尔索引),但为了学习目的,我试图理解为什么filter
当它适用于 a 时会失败groupby
,如下所示:
This works:
这有效:
filtered_df = df.groupby('petal width (cm)').filter(lambda x: x['sepal width (cm)'].sum() > 50)
回答by Willem Van Onsem
You can use the condition indexed_df['petal length (cm)'] > 1.4
(here we use indexed_df
, not x
) as a way to filter the dataframe, so:
您可以使用条件indexed_df['petal length (cm)'] > 1.4
(这里我们使用indexed_df
, not x
)作为过滤数据框的一种方式,因此:
indexed_df[indexed_df['petal length (cm)'] > 1.4]
How does this work?
这是如何运作的?
If you perform indexed_df['petal length (cm)']
you obtain the "column" of the dataframe: some sort of sequence where for every index, we get the value of that column. By performing a column > 1.4
, we obtain some sort of column of booleans: True
if the condition is met for a certain row, and False
otherwise.
如果您执行,indexed_df['petal length (cm)']
您将获得数据框的“列”:某种序列,对于每个索引,我们都会获得该列的值。通过执行 a column > 1.4
,我们获得某种类型的布尔值列:True
如果某一行满足条件,False
否则。
We then can use such boolean column as an element for the dataframe indexed_df[boolean_column]
to obtain only the rows where the corresponding row of the boolean_column
is True
.
然后,我们可以使用这样的布尔列作为一个元素的数据帧indexed_df[boolean_column]
只以获得行,其中的对应的行boolean_column
是True
。