Python 根据“不在”条件从数据框中删除行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27965295/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:32:36  来源:igfitidea点击:

dropping rows from dataframe based on a "not in" condition

pythonpandas

提问by gaurav gurnani

I want to drop rows from a pandas dataframe when the value of the date column is in a list of dates. The following code doesn't work:

当日期列的值在日期列表中时,我想从熊猫数据框中删除行。以下代码不起作用:

a=['2015-01-01' , '2015-02-01']

df=df[df.datecolumn not in a]

I get the following error:

我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

采纳答案by Ffisegydd

You can use pandas.Dataframe.isin.

您可以使用pandas.Dataframe.isin.

pandas.Dateframe.isinwill return boolean values depending on whether each element is inside the list aor not. You then invert this with the ~to convert Trueto Falseand vice versa.

pandas.Dateframe.isin将根据每个元素是否在列表中返回布尔值a。然后,您可以使用~to convert Trueto将其反转,False反之亦然。

import pandas as pd

a = ['2015-01-01' , '2015-02-01']

df = pd.DataFrame(data={'date':['2015-01-01' , '2015-02-01', '2015-03-01' , '2015-04-01', '2015-05-01' , '2015-06-01']})

print(df)
#         date
#0  2015-01-01
#1  2015-02-01
#2  2015-03-01
#3  2015-04-01
#4  2015-05-01
#5  2015-06-01

df = df[~df['date'].isin(a)]

print(df)
#         date
#2  2015-03-01
#3  2015-04-01
#4  2015-05-01
#5  2015-06-01

回答by YS-L

You can use Series.isin:

您可以使用Series.isin

df = df[~df.datecolumn.isin(a)]

While the error message suggests that all()or any()can be used, they are useful only when you want to reduce the result into a single Boolean value. That is however not what you are trying to do now, which is to test the membership of every values in the Series against the external list, and keep the results intact (i.e., a Boolean Series which will then be used to slice the original DataFrame).

虽然错误消息表明all()any()可以使用,但它们仅在您想要将结果减少为单个布尔值时才有用。然而,这不是您现在想要做的,即针对外部列表测试系列中每个值的成员资格,并保持结果完整(即,一个布尔系列,然后将用于切片原始 DataFrame )。

You can read more about this in the Gotchas.

您可以在Gotchas 中阅读有关此内容的更多信息。