Python PySpark:类型错误:“列”对象不可调用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39367662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PySpark: TypeError: 'Column' object is not callable
提问by Matthias
I'm loading data from HDFS, which I want to filter by specific variables. But somehow the Column.isin command does not work. It throws this error:
我正在从 HDFS 加载数据,我想按特定变量对其进行过滤。但不知何故 Column.isin 命令不起作用。它抛出这个错误:
TypeError: 'Column' object is not callable
类型错误:“列”对象不可调用
from pyspark.sql.functions import udf, col
variables = ('852-PI-769', '812-HC-037', '852-PC-571-OUT')
df = sqlContext.read.option("mergeSchema", "true").parquet("parameters.parquet")
same_var = col("Variable").isin(variables)
df2 = df.filter(same_var)
The schema looks like this:
架构如下所示:
df.printSchema()
root
|-- Time: timestamp (nullable = true)
|-- Value: float (nullable = true)
|-- Variable: string (nullable = true)
Any idea what am I doing wrong? PS: It's Spark 1.4 with Jupyter Notebook.
知道我做错了什么吗?PS:它是带有 Jupyter Notebook 的 Spark 1.4。
采纳答案by Shaido - Reinstate Monica
The problem is that isin
was added to Spark in version 1.5.0 and therefore not yet avaiable in your version of Spark as seen in the documentation of isin
here.
问题是,isin
加入Spark在1.5.0版本,因此尚未缴费在你的星火版本的文档中看到isin
这里。
There is a similar function in
in the Scala API that was introduced in 1.3.0 which has a similar functionality (there are some differences in the input since in
only accepts columns). In PySpark this function is called inSet
instead. Usage examples from the documentation:
有类似的功能in
这是在1.3.0具有类似的功能(有在输入的一些差异,因为引入的Scala的API中in
只接受列)。在 PySpark 中,这个函数被调用inSet
。文档中的使用示例:
df[df.name.inSet("Bob", "Mike")] df[df.age.inSet([1, 2, 3])]
df[df.name.inSet("Bob", "Mike")] df[df.age.inSet([1, 2, 3])]
Note: inSet
is depricated in version 1.5.0 and forward, isin
should be used in newer versions.
注意:inSet
在 1.5.0 及isin
更高版本中已弃用,应在较新版本中使用。
回答by PSR
Please use the below code to check
请使用以下代码进行检查
df.filter(df.Variable.isin(['852-PI-769', '812-HC-037', '852-PC-571-OUT']))