pandas 系列的真值不明确 - 调用函数时出错

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45148183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:01:28  来源:igfitidea点击:

The truth value of a Series is ambiguous - Error when calling a function

pythonpandasif-statementdataframe

提问by i.n.n.m

I know following error

我知道以下错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

has been asked a long time ago.

很久以前就被问到了。

However, I am trying to create a basic function and return a new column with df['busy']with 1or 0. My function looks like this,

但是,我正在尝试创建一个基本函数并返回一个df['busy']带有1或的新列0。我的功能看起来像这样,

def hour_bus(df):
    if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
             (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
         return df['busy'] == 1
     else:
         return df['busy'] == 0 

I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following threadand another threadto create that function. I used &instead of andin my ifclause.

我可以执行该函数,但是当我使用 DataFrame 调用它时,出现上述错误。我遵循以下线程和另一个线程来创建该函数。我在我的条款中使用了&而不是。andif

Anyhow, when I do the following, I get my desired output.

无论如何,当我执行以下操作时,我会得到我想要的输出。

df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
                        (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')

Any ideas on what mistake am I making in my hour_busfunction?

关于我在我的hour_bus函数中犯了什么错误的任何想法?

采纳答案by MSeifert

The

(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')

gives a boolean array, and when you index your dfwith that you'll get a (probably) smaller part of your df.

给出一个布尔数组,当你df用它索引你时,你会得到一个(可能)较小的df.

Just to illustrate what I mean:

只是为了说明我的意思:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0    False
# 1    False
# 2     True
# 3     True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
#    a
# 2  3
# 3  4

However it's still a DataFrameso it's ambiguous to use it as expression that requires a truth value (in your case an if).

但是它仍然是 aDataFrame所以将它用作需要真值的表达式(在你的情况下是 an if)是不明确的。

bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You could use the np.whereyou used - or equivalently:

您可以使用np.where您使用的 - 或等效的:

def hour_bus(df):
    mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
    res = df['busy'] == 0                             
    res[mask] = (df['busy'] == 1)[mask]  # replace the values where the mask is True
    return res

However the np.wherewill be the better solution (it's more readable and probably faster).

然而,这np.where将是更好的解决方案(它更具可读性并且可能更快)。