Python:在 Pandas lambda 表达式中使用函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36362888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:59:03  来源:igfitidea点击:

Python: use a function in pandas lambda expression

pythonpandaslambdadataframe

提问by Edamame

I have the following code, trying to find the hour of the 'Dates' column in a data frame:

我有以下代码,试图在数据框中查找“日期”列的小时数:

print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)

def find_hour(self, input):
    return input[11:13].astype(float)

where the print(df['Dates'].head(3))looks like:

其中print(df['Dates'].head(3))的样子:

0    2015-05-13 23:53:00
1    2015-05-13 23:53:00
2    2015-05-13 23:33:00

However, I got the following error:

但是,我收到以下错误:

    df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
NameError: ("global name 'find_hour' is not defined", u'occurred at index 0')

Does anyone know what I missed? Thanks!

有谁知道我错过了什么?谢谢!



Note that if I put the function directly in the lambda line like below, everything works fine:

请注意,如果我将函数直接放在 lambda 行中,如下所示,一切正常:

df['hour'] = df.apply(lambda x: x['Dates'][11:13], axis=1).astype(float)

回答by zondo

You are trying to use find_hourbefore it has yet been defined. You just need to switch things around:

您正在尝试find_hour在尚未定义之前使用。你只需要改变事情:

def find_hour(self, input):
    return input[11:13].astype(float)

print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)

Edit: Padraic has pointed out a very important point: find_hour()is defined as taking two arguments, selfand input, but you are giving it only one. You should define find_hour()as def find_hour(input):except that defining the argument as inputshadows the built-in function. You might consider renaming it to something a little more descriptive.

编辑:Padraic 指出了一个非常重要的点:find_hour()被定义为采用两个参数, selfand input,但你只给它一个。您应该定义find_hour()def find_hour(input):除了将参数定义为input隐藏内置函数之外。您可能会考虑将其重命名为更具描述性的名称。

回答by MaxU

what is wrong with old good .dt.hour?

old good 有.dt.hour什么问题?

In [202]: df
Out[202]:
                 Date
0 2015-05-13 23:53:00
1 2015-05-13 23:53:00
2 2015-05-13 23:33:00

In [217]: df['hour'] = df.Date.dt.hour

In [218]: df
Out[218]:
                 Date  hour
0 2015-05-13 23:53:00    23
1 2015-05-13 23:53:00    23
2 2015-05-13 23:33:00    23

and if your Datecolumn is of string type you may want to convert it to datetime first:

如果您的Date列是字符串类型,您可能希望先将其转换为日期时间:

df.Date = pd.to_datetime(df.Date)

or just:

要不就:

df['hour'] = int(df.Date.str[11:13])