带有 lambda 函数的 Pandas .filter() 方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48304854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:03:59  来源:igfitidea点击:

Pandas .filter() method with lambda function

pythonpandas

提问by confused_pup

I'm trying to understand the .filter()method in Pandas. I'm not sure why the below code doesn't work:

我试图理解Pandas 中的.filter()方法。我不确定为什么下面的代码不起作用:

# Load data
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Set arbitrary index (is this needed?) and try filtering:
indexed_df = df.copy().set_index('sepal width (cm)')
test = indexed_df.filter(lambda x: x['petal length (cm)'] > 1.4)

I get:

我得到:

TypeError: 'function' object is not iterable

I appreciate there are simpler ways to do this (e.g. Boolean indexing) but I'm trying to understand for learning purposes why filterfails here when it works for a groupbyas shown below:

我很欣赏有更简单的方法来做到这一点(例如布尔索引),但为了学习目的,我试图理解为什么filter当它适用于 a 时会失败groupby,如下所示:

This works:

这有效:

 filtered_df = df.groupby('petal width (cm)').filter(lambda x: x['sepal width (cm)'].sum() > 50)

回答by Willem Van Onsem

You can use the condition indexed_df['petal length (cm)'] > 1.4(here we use indexed_df, not x) as a way to filter the dataframe, so:

您可以使用条件indexed_df['petal length (cm)'] > 1.4(这里我们使用indexed_df, not x)作为过滤数据框的一种方式,因此:

indexed_df[indexed_df['petal length (cm)'] > 1.4]

How does this work?

这是如何运作的?

If you perform indexed_df['petal length (cm)']you obtain the "column" of the dataframe: some sort of sequence where for every index, we get the value of that column. By performing a column > 1.4, we obtain some sort of column of booleans: Trueif the condition is met for a certain row, and Falseotherwise.

如果您执行,indexed_df['petal length (cm)']您将获得数据框的“”:某种序列,对于每个索引,我们都会获得该列的值。通过执行 a column > 1.4,我们获得某种类型的布尔值列:True如果某一行满足条件,False否则。

We then can use such boolean column as an element for the dataframe indexed_df[boolean_column]to obtain only the rows where the corresponding row of the boolean_columnis True.

然后,我们可以使用这样的布尔列作为一个元素的数据帧indexed_df[boolean_column]只以获得行,其中的对应的行boolean_columnTrue