Python 如何在 PySpark 中仅打印 DataFrame 的某一列？

Question

提问by mar tin

Can one use the actions collector taketo print only a given column of DataFrame?

可以使用这些操作collect或take仅打印 DataFrame 的给定列吗？

This

这个

df.col.collect()

gives error

给出错误

TypeError: 'Column' object is not callable

类型错误：“列”对象不可调用

and this:

和这个：

df[df.col].take(2)

gives

给

pyspark.sql.utils.AnalysisException: u"filter expression 'col' of type string is not a boolean.;"

pyspark.sql.utils.AnalysisException：u“字符串类型的过滤器表达式‘col’不是布尔值。；”

Answer 1

回答by zero323

selectand show:

select和show：

df.select("col").show()

or select, flatMap, collect:

或select, flatMap, collect:

df.select("col").rdd.flatMap(list).collect()

Bracket notation (df[df.col]) is used only for logical slicing and columns by itself (df.col) are not distributed data structures but SQL expressions and cannot be collected.

方括号 ( df[df.col]) 仅用于逻辑切片，列本身 ( df.col) 不是分布式数据结构，而是 SQL 表达式，无法收集。

Python 如何在 PySpark 中仅打印 DataFrame 的某一列？

提问by mar tin

回答by zero323

相关推荐

最近更新

标签

Python 如何在 PySpark 中仅打印 DataFrame 的某一列？

提问by mar tin

回答by zero323

相关推荐

Python AttributeError: 模块“PyQt5.QtGui”没有属性“QWidget”

Python 3.5 遍历字典列表

创建单行 python pandas 数据框

Python datetime.datetime 不是 JSON 可序列化的

相关推荐

最近更新

标签