Pandas 过滤器函数返回一个系列，但需要一个标量布尔值

Question

提问by lathomas64

I am attempting to use filter on a pandas dataframe to filter out all rows that match a duplicate value(need to remove ALL the rows when there are duplicates, not just the first or last).

我试图在 Pandas 数据帧上使用过滤器来过滤掉所有匹配重复值的行（当有重复时需要删除所有行，而不仅仅是第一个或最后一个）。

This is what I have that works in the editor :

这是我在编辑器中工作的内容：

df = df.groupby("student_id").filter(lambda x: x.count() == 1)

But when I run my script with this code in it I get the error:

但是，当我运行包含此代码的脚本时，出现错误：

TypeError: filter function returned a Series, but expected a scalar bool

类型错误：过滤器函数返回一个系列，但需要一个标量布尔值

I am creating the dataframe by concatenating two other frames immediately before trying to apply the filter.

我通过在尝试应用过滤器之前立即连接另外两个帧来创建数据帧。

Answer 1

采纳答案by leo

it should be:

它应该是：

In [32]: grouped = df.groupby("student_id")

In [33]: grouped.filter(lambda x: x["student_id"].count()==1)

Updates:

更新：

i'm not sure about the issue u mentioned regarding the interactive console. technically speaking in this particular case (there might be other situations such as the intricate "import" functionality in which diff env may behave differently), the console (such as ipython) should behave the same as other environment (orig python env, or some IDE embedded one)

我不确定您提到的有关交互式控制台的问题。从技术上讲，在这种特殊情况下（可能存在其他情况，例如复杂的“导入”功能，其中 diff env 的行为可能不同），控制台（例如 ipython）的行为应与其他环境（orig python env，或某些IDE内嵌一个）

an intuitive way to understand the pandas groupby is to treat the return obj of DataFrame.groupby() as a list of dataframe. so when u try to using filter to apply the lambda function upon x, x is actually one of those dataframes:

理解 Pandas groupby 的一种直观方法是将 DataFrame.groupby() 的返回对象视为数据帧列表。因此，当您尝试使用过滤器将 lambda 函数应用于 x 时，x 实际上是这些数据帧之一：

In[25]: df = pd.DataFrame(data,columns=year)

In[26]: df

Out[26]: 
   2013  2014
0     0     1
1     2     3
2     4     5
3     6     7
4     0     1
5     2     3
6     4     5
7     6     7

In[27]: grouped = df.groupby(2013)

In[28]: grouped.count()

Out[28]: 
      2014
2013      
0        2
2        2
4        2
6        2

in this example, the first dataframe in the grouped obj would be:

在此示例中，分组 obj 中的第一个数据帧将是：

In[33]: df1 = df.ix[[0,4]]

In[34]: df1

Out[33]: 
   2013  2014
0     0     1
4     0     1

Answer 2

回答by JD Long

how about using the pd.DataFrame.drop_duplicates()method?

使用pd.DataFrame.drop_duplicates()方法怎么样？

Documentation.

文档。

Are you sure you really want to remove ALL rows? Not n-1?

您确定真的要删除所有行吗？不是n-1？

Pandas 过滤器函数返回一个系列，但需要一个标量布尔值

提问by lathomas64

采纳答案by leo

回答by JD Long

相关推荐

最近更新

标签

Pandas 过滤器函数返回一个系列，但需要一个标量布尔值

提问by lathomas64

采纳答案by leo

回答by JD Long

相关推荐

pandas Python 熊猫 to_sql '追加'

删除列值类型为字符串 Pandas 的行

按列值复制 Pandas 数据框中的行

Pandas DataFrame 的单列中的多个值

相关推荐

最近更新

标签