Python 类型错误:有一个意外的关键字参数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37347296/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:15:59  来源:igfitidea点击:

TypeError: got an unexpected keyword argument

pythonapache-sparkpysparkapache-spark-sqluser-defined-functions

提问by Nirmal

The seemingly simple code below throws the following error:

下面看似简单的代码抛出如下错误:

Traceback (most recent call last):
  File "/home/nirmal/process.py", line 165, in <module>
    'time_diff': f.last(adf['time_diff']).over(window_device_rows)
TypeError: __call__() got an unexpected keyword argument 'this_campaign'

Code:

代码:

# Function to flag network timeouts
def flag_network_timeout(**kwargs):
    if kwargs['this_network'] != kwargs['last_network'] \
            or kwargs['this_campaign'] != kwargs['last_campaign'] \
            or kwargs['this_adgroup'] != kwargs['last_adgroup'] \
            or kwargs['this_creative'] != kwargs['last_creative'] \
            or kwargs['time_diff'] > network_timeout:
        return 1
    else:
        return 0
flag_network_timeout = f.udf(flag_network_timeout, IntegerType())

# Column spec to go over the device events and flag network resets
network_timeout_flag = flag_network_timeout(**{
    'last_network': f.first(adf['network']).over(window_device_rows),
    'last_campaign': f.first(adf['campaign']).over(window_device_rows),
    'last_adgroup': f.first(adf['adgroup']).over(window_device_rows),
    'last_creative': f.first(adf['creative']).over(window_device_rows),
    'this_network': f.last(adf['network']).over(window_device_rows),
    'this_campaign': f.last(adf['campaign']).over(window_device_rows),
    'this_adgroup': f.last(adf['adgroup']).over(window_device_rows),
    'this_creative': f.last(adf['creative']).over(window_device_rows),
    'time_diff': f.last(adf['time_diff']).over(window_device_rows)
})

# Update dataframe with the new columns
adf = adf.select('*', network_timeout_flag.alias('network_timeout'))

What am I doing wrong please? Thank you.

请问我做错了什么?谢谢你。

回答by zero323

You get an exception because UserDefinedFunction.__call__supports only varargs and not keyword args.

你会得到一个例外,因为UserDefinedFunction.__call__只支持可变参数而不支持关键字参数。

def __call__(self, *cols):
    sc = SparkContext._active_spark_context
    jc = self._judf.apply(_to_seq(sc, cols, _to_java_column))
    return Column(jc)

At much more basic level UDF can receive only Columnarguments, which will be expanded to their corresponding value on runtime, and not standard Python objects.

在更基本的级别上,UDF 只能接收Column参数,这些参数将在运行时扩展为相应的值,而不是标准的 Python 对象。

Personally I wouldn't use **kwargsfor this at all, but ignoring that you can achieve what you want by composing SQL expressions:

就我个人而言,我根本不会使用**kwargs它,但忽略了您可以通过编写 SQL 表达式来实现您想要的:

def flag_network_timeout_(**kwargs):

    cond = (
        (kwargs['this_network'] != kwargs['last_network']) |
        (kwargs['this_campaign'] != kwargs['last_campaign']) |
        (kwargs['this_adgroup'] != kwargs['last_adgroup']) |
        (kwargs['this_creative'] != kwargs['last_creative']) |
        (kwargs['time_diff'] > network_timeout))

    return f.when(cond, 1).otherwise(0)