pandas 通过 id 列表过滤熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23745677/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:04:06  来源:igfitidea点击:

Filtering pandas data frame by a list of id's

pythonpandasdataframe

提问by redrubia

I have a pandas dataframe, which has a list of user id's 'subscriber_id' and some other info.

我有一个 Pandas 数据框,其中包含用户 ID 的“subscriber_id”列表和其他一些信息。

I want to only select subscribers not in a given list A.

我只想选择不在给定列表 A 中的订阅者。

So if our data frame contains info for subscribers [1,2,3,4,5] and my exclude list is [2,4,5], I should now get a dataframe with information for [1,3]

因此,如果我们的数据框包含订阅者的信息 [1,2,3,4,5] 并且我的排除列表是 [2,4,5],我现在应该得到一个包含 [1,3] 信息的数据框

I have tried using a mask as follows:

我曾尝试使用如下面罩:

temp = df.mask(lambda x: x['subscriber_id'] not in subscribers)

temp = df.mask(lambda x: x['subscriber_id'] not in subscribers)

but no luck!

但没有运气!

I am sure the not inis valid Python syntax, as I tested it on a list as follows:

我确定这not in是有效的 Python 语法,因为我在列表中对其进行了如下测试:

c = [1,2,3,4,5]
if 5 not in c:
    print 'YAY'
>> YAY

Any suggestion or alternative way to filter the dataframe?

过滤数据框的任何建议或替代方法?

回答by unutbu

You could use the isinmethod:

您可以使用以下isin方法:

In [30]: df = pd.DataFrame({'subscriber_id':[1,2,3,4,5]})

In [31]: df
Out[31]: 
   subscriber_id
0              1
1              2
2              3
3              4
4              5

[5 rows x 1 columns]

In [32]: mask = df['subscriber_id'].isin([2,4,5])

In [33]: mask
Out[33]: 
0    False
1     True
2    False
3     True
4     True
Name: subscriber_id, dtype: bool

In [34]: df.loc[~mask]
Out[34]: 
   subscriber_id
0              1
2              3

[2 rows x 1 columns]


If you use df.mask, then the input must be a boolean NDFrame or an array. lambda x: x['subscriber_id'] not in subscribersis a function, which is why it raised an exception.

如果使用df.mask,则输入必须是布尔型 NDFrame 或数组。lambda x: x['subscriber_id'] not in subscribers是一个函数,这就是它引发异常的原因。

Here is one way you could use df.mask, again with isinto form the boolean condition:

这是您可以df.mask再次使用 withisin来形成布尔条件的一种方法:

In [43]: df['subscriber_id'].mask(df['subscriber_id'].isin([2,4,5]).values)
Out[43]: 
0     1
1   NaN
2     3
3   NaN
4   NaN
Name: subscriber_id, dtype: float64