pandas 系列的真值不明确 - 调用函数时出错
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45148183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The truth value of a Series is ambiguous - Error when calling a function
提问by i.n.n.m
I know following error
我知道以下错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
has been asked a long time ago.
很久以前就被问到了。
However, I am trying to create a basic function and return a new column with df['busy']
with 1
or 0
. My function looks like this,
但是,我正在尝试创建一个基本函数并返回一个df['busy']
带有1
或的新列0
。我的功能看起来像这样,
def hour_bus(df):
if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
return df['busy'] == 1
else:
return df['busy'] == 0
I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following threadand another threadto create that function. I used &
instead of and
in my if
clause.
我可以执行该函数,但是当我使用 DataFrame 调用它时,出现上述错误。我遵循以下线程和另一个线程来创建该函数。我在我的条款中使用了&
而不是。and
if
Anyhow, when I do the following, I get my desired output.
无论如何,当我执行以下操作时,我会得到我想要的输出。
df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')
Any ideas on what mistake am I making in my hour_bus
function?
关于我在我的hour_bus
函数中犯了什么错误的任何想法?
采纳答案by MSeifert
The
这
(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
gives a boolean array, and when you index your df
with that you'll get a (probably) smaller part of your df
.
给出一个布尔数组,当你df
用它索引你时,你会得到一个(可能)较小的df
.
Just to illustrate what I mean:
只是为了说明我的意思:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0 False
# 1 False
# 2 True
# 3 True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
# a
# 2 3
# 3 4
However it's still a DataFrame
so it's ambiguous to use it as expression that requires a truth value (in your case an if
).
但是它仍然是 aDataFrame
所以将它用作需要真值的表达式(在你的情况下是 an if
)是不明确的。
bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You could use the np.where
you used - or equivalently:
您可以使用np.where
您使用的 - 或等效的:
def hour_bus(df):
mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
res = df['busy'] == 0
res[mask] = (df['busy'] == 1)[mask] # replace the values where the mask is True
return res
However the np.where
will be the better solution (it's more readable and probably faster).
然而,这np.where
将是更好的解决方案(它更具可读性并且可能更快)。