在 Pandas 数据框上使用布尔过滤器时出现 KeyError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33817842/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:15:25  来源:igfitidea点击:

KeyError when using boolean filter on pandas data frame

pythonpandasbooleandataframekeyerror

提问by Ben Price

Trying to combine two data frames when a datetime object from one dataframe is within a datetime object range in the other.

当一个数据框中的日期时间对象在另一个数据框中的日期时间对象范围内时,尝试组合两个数据框。

Keep getting: KeyError: 'cannot use a single bool to index into setitem' on this line of code in the second chunk I posted.

在我发布的第二个块中的这一行代码中,不断收到:KeyError: 'cannot use a single bool to index into setitem'。

gametaxidf.loc[arrivemask, 'relevant'] = 1

I'm assuming it would happen on the following line with a similar command as well.

我假设它也会在下一行使用类似的命令发生。

This is the part giving me trouble:

这是给我带来麻烦的部分:

with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv', 'w') as csvfile: 
    fieldnames1 = ['index','pickup_datetime', 'dropoff_datetime', 'pickup_long', 'pickup_lat','dropoff_long','dropoff_lat','passenger_count','trip_distance','fare_amount','tip_amount','total_amount','stadium_code'] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames1) 
    writer.writeheader()

for index, row in baseballdf.iterrows(): 
    gametimestart = row['Start.Time'] 
    gametimeend = row['End.Time'] 
    arrivemin = gametimestart - datetime.timedelta(minutes=120) 
    arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
    departmin = gametimeend - datetime.timedelta(minutes = 60) 
    departmax = gametimeend + datetime.timedelta(minutes = 90)

    gametaxidf = combineddf[combineddf.DATE==row.DATE]
    gametaxidf['relevant']=0

    for index, row in gametaxidf.iterrows():
        arrivemask = (arrivemin < row['dropoff_datetime']) and (row['dropoff_datetime'] < arrivemax)
        departmask = (departmin < row['pickup_datetime']) and (row['pickup_datetime'] < departmax) 
        gametaxidf.loc[arrivemask, 'relevant'] = 1
        gametaxidf.loc[departmask, 'relevant'] = 1

        with open('/Users/benjaminprice/Desktop/TaxiCombined/Data/combinedtaxifiltered.csv','a') as combinedtaxi:
            gametaxidf.to_csv(combinedtaxi,header=None)
    print(str(index) + "done")

Gametaxidf.head(5):

Gametaxidf.head(5):

   index     pickup_datetime    dropoff_datetime  pickup_long  pickup_lat  \
0    195 2014-04-01 00:08:13 2014-04-01 00:15:32   -73.922218   40.827557   
1    344 2014-04-01 00:16:30 2014-04-01 00:20:38   -73.846046   40.754566   
2    558 2014-04-01 00:28:59 2014-04-01 00:36:36   -73.921692   40.831394   
3    744 2014-04-01 00:42:00 2014-04-01 00:49:46   -73.938080   40.804646   
4    776 2014-04-01 00:43:54 2014-04-01 00:53:22   -73.952652   40.810577   

   dropoff_long  dropoff_lat  passenger_count  trip_distance  fare_amount  \
0    -73.900620    40.856174                1           2.30          9.0   
1    -73.890259    40.753246                1           0.56          4.5   
2    -73.942719    40.823257                1           1.53          7.0   
3    -73.928490    40.830433                1           2.96         11.0   
4    -73.924332    40.827320                1           2.28         10.5   

   tip_amount  total_amount  stadium_code       DATE  relevant  
0           0          10.0           1.1 2014-04-01         0  
1           0           5.5           2.1 2014-04-01         0  
2           0           8.0           1.1 2014-04-01         0  
3           0          12.0           1.0 2014-04-01         0  
4           0          11.5           1.0 2014-04-01         0 

Also getting this warning: A value is trying to be set on a copy of a slice from a DataFrame.

还收到此警告:正在尝试在来自 DataFrame 的切片副本上设置值。

Try using .loc[row_indexer,col_indexer] = value instead

But it's letting me continue through that... any help would be great.

但这让我继续……任何帮助都会很棒。

回答by tworec

Here

这里

gametaxidf.loc[arrivemask, 'relevant'] = 1

you're trying to set dataframe values by .locoperator. Pandas docs for selecting rowssays:

您正在尝试按.loc操作员设置数据帧值。用于选择行的 Pandas 文档说:

.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)
  • A boolean array

.loc 主要基于标签,但也可以与布尔数组一起使用。.loc 将在未找到项目时引发 KeyError 。允许的输入是:

  • 单个标签,例如 5 或 'a',(注意 5 被解释为索引的标签。这种用法不是沿索引的整数位置)
  • 标签列表或数组 ['a', 'b', 'c']
  • 带有标签 'a':'f' 的切片对象(注意,与通常的 python 切片相反,开始和停止都包括在内!)
  • 一个布尔数组

You're trying to use the last type of input, but this

您正在尝试使用最后一种类型的输入,但这

arrivemask = (arrivemin < row['dropoff_datetime']) and 
    (row['dropoff_datetime'] < arrivemax)

is scalar boolean, not array.

是标量布尔值,而不是数组。

You need not to iterate through dataframe. Pandas does it for you. Just use:

您无需遍历数据框。Pandas为你做。只需使用:

gametaxidf.loc[
   (arrivemin < gametaxidf['dropoff_datetime'])
   &
   (gametaxidf['dropoff_datetime'] < arrivemax)
   , 'relevant'] = 1