带有 lambda 函数的 Pandas .filter() 方法

Question

提问by confused_pup

I'm trying to understand the .filter()method in Pandas. I'm not sure why the below code doesn't work:

我试图理解Pandas 中的.filter()方法。我不确定为什么下面的代码不起作用：

# Load data
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Set arbitrary index (is this needed?) and try filtering:
indexed_df = df.copy().set_index('sepal width (cm)')
test = indexed_df.filter(lambda x: x['petal length (cm)'] > 1.4)

I get:

我得到：

TypeError: 'function' object is not iterable

I appreciate there are simpler ways to do this (e.g. Boolean indexing) but I'm trying to understand for learning purposes why filterfails here when it works for a groupbyas shown below:

我很欣赏有更简单的方法来做到这一点（例如布尔索引），但为了学习目的，我试图理解为什么filter当它适用于 a 时会失败groupby，如下所示：

This works:

这有效：

 filtered_df = df.groupby('petal width (cm)').filter(lambda x: x['sepal width (cm)'].sum() > 50)

Answer 1

回答by Willem Van Onsem

You can use the condition indexed_df['petal length (cm)'] > 1.4(here we use indexed_df, not x) as a way to filter the dataframe, so:

您可以使用条件indexed_df['petal length (cm)'] > 1.4（这里我们使用indexed_df, not x）作为过滤数据框的一种方式，因此：

indexed_df[indexed_df['petal length (cm)'] > 1.4]

How does this work?

这是如何运作的？

If you perform indexed_df['petal length (cm)']you obtain the "column" of the dataframe: some sort of sequence where for every index, we get the value of that column. By performing a column > 1.4, we obtain some sort of column of booleans: Trueif the condition is met for a certain row, and Falseotherwise.

如果您执行，indexed_df['petal length (cm)']您将获得数据框的“列”：某种序列，对于每个索引，我们都会获得该列的值。通过执行 a column > 1.4，我们获得某种类型的布尔值列：True如果某一行满足条件，False否则。

We then can use such boolean column as an element for the dataframe indexed_df[boolean_column]to obtain only the rows where the corresponding row of the boolean_columnis True.

然后，我们可以使用这样的布尔列作为一个元素的数据帧indexed_df[boolean_column]只以获得行，其中的对应的行boolean_column是True。

带有 lambda 函数的 Pandas .filter() 方法

提问by confused_pup

回答by Willem Van Onsem

相关推荐

最近更新

标签

带有 lambda 函数的 Pandas .filter() 方法

提问by confused_pup

回答by Willem Van Onsem

相关推荐

pandas geopandas 指向多边形

Python Pandas 列中的总和值如果日期介于 2 个日期之间

使用 if-else 创建新列时的 Pandas 错误：Series 的真值不明确

pandas 熊猫数据帧日期时间到时间然后到秒

相关推荐

最近更新

标签