Python PySpark：类型错误：“列”对象不可调用

Question

提问by Matthias

I'm loading data from HDFS, which I want to filter by specific variables. But somehow the Column.isin command does not work. It throws this error:

我正在从 HDFS 加载数据，我想按特定变量对其进行过滤。但不知何故 Column.isin 命令不起作用。它抛出这个错误：

TypeError: 'Column' object is not callable

类型错误：“列”对象不可调用

from pyspark.sql.functions import udf, col
variables = ('852-PI-769', '812-HC-037', '852-PC-571-OUT')
df = sqlContext.read.option("mergeSchema", "true").parquet("parameters.parquet")
same_var = col("Variable").isin(variables)
df2 = df.filter(same_var)

The schema looks like this:

架构如下所示：

df.printSchema()
root
 |-- Time: timestamp (nullable = true)
 |-- Value: float (nullable = true)
 |-- Variable: string (nullable = true)

Any idea what am I doing wrong? PS: It's Spark 1.4 with Jupyter Notebook.

知道我做错了什么吗？PS：它是带有 Jupyter Notebook 的 Spark 1.4。

Answer 1

采纳答案by Shaido - Reinstate Monica

The problem is that isinwas added to Spark in version 1.5.0 and therefore not yet avaiable in your version of Spark as seen in the documentation of isinhere.

问题是，isin加入Spark在1.5.0版本，因此尚未缴费在你的星火版本的文档中看到isin这里。

There is a similar function inin the Scala API that was introduced in 1.3.0 which has a similar functionality (there are some differences in the input since inonly accepts columns). In PySpark this function is called inSetinstead. Usage examples from the documentation:

有类似的功能in这是在1.3.0具有类似的功能（有在输入的一些差异，因为引入的Scala的API中in只接受列）。在 PySpark 中，这个函数被调用inSet。文档中的使用示例：

df[df.name.inSet("Bob", "Mike")]
df[df.age.inSet([1, 2, 3])]

df[df.name.inSet("Bob", "Mike")]
df[df.age.inSet([1, 2, 3])]

Note: inSetis depricated in version 1.5.0 and forward, isinshould be used in newer versions.

注意：inSet在 1.5.0 及isin更高版本中已弃用，应在较新版本中使用。

Answer 2

回答by PSR

Please use the below code to check

请使用以下代码进行检查

df.filter(df.Variable.isin(['852-PI-769', '812-HC-037', '852-PC-571-OUT']))

Python PySpark：类型错误：“列”对象不可调用

提问by Matthias

采纳答案by Shaido - Reinstate Monica

回答by PSR

相关推荐

最近更新

标签

Python PySpark：类型错误：“列”对象不可调用

提问by Matthias

采纳答案by Shaido - Reinstate Monica

回答by PSR

相关推荐

如何在 Python OpenCV 中增加图像的对比度

Python 如何获得张量的类型？

如何在 MacOS 上的 python 中安装 xgboost？

Python 更改matplotlib pyplot图例中线条的线宽

相关推荐

最近更新

标签