Python PySpark:类型错误:“列”对象不可调用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39367662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:11:04  来源:igfitidea点击:

PySpark: TypeError: 'Column' object is not callable

pythonapache-sparkpysparkspark-dataframe

提问by Matthias

I'm loading data from HDFS, which I want to filter by specific variables. But somehow the Column.isin command does not work. It throws this error:

我正在从 HDFS 加载数据,我想按特定变量对其进行过滤。但不知何故 Column.isin 命令不起作用。它抛出这个错误:

TypeError: 'Column' object is not callable

类型错误:“列”对象不可调用

from pyspark.sql.functions import udf, col
variables = ('852-PI-769', '812-HC-037', '852-PC-571-OUT')
df = sqlContext.read.option("mergeSchema", "true").parquet("parameters.parquet")
same_var = col("Variable").isin(variables)
df2 = df.filter(same_var)

The schema looks like this:

架构如下所示:

df.printSchema()
root
 |-- Time: timestamp (nullable = true)
 |-- Value: float (nullable = true)
 |-- Variable: string (nullable = true)

Any idea what am I doing wrong? PS: It's Spark 1.4 with Jupyter Notebook.

知道我做错了什么吗?PS:它是带有 Jupyter Notebook 的 Spark 1.4。

采纳答案by Shaido - Reinstate Monica

The problem is that isinwas added to Spark in version 1.5.0 and therefore not yet avaiable in your version of Spark as seen in the documentation of isinhere.

问题是,isin加入Spark在1.5.0版本,因此尚未缴费在你的星火版本的文档中看到isin这里

There is a similar function inin the Scala API that was introduced in 1.3.0 which has a similar functionality (there are some differences in the input since inonly accepts columns). In PySpark this function is called inSetinstead. Usage examples from the documentation:

有类似的功能in这是在1.3.0具有类似的功能(有在输入的一些差异,因为引入的Scala的API中in只接受列)。在 PySpark 中,这个函数被调用inSet。文档中的使用示例:

df[df.name.inSet("Bob", "Mike")]
df[df.age.inSet([1, 2, 3])]
df[df.name.inSet("Bob", "Mike")]
df[df.age.inSet([1, 2, 3])]

Note: inSetis depricated in version 1.5.0 and forward, isinshould be used in newer versions.

注意inSet在 1.5.0 及isin更高版本中已弃用,应在较新版本中使用。

回答by PSR

Please use the below code to check

请使用以下代码进行检查

df.filter(df.Variable.isin(['852-PI-769', '812-HC-037', '852-PC-571-OUT']))