scala 如何从命令行或 spark shell 显示拼花文件的方案（包括类型）？

Question

提问by samthebest

I have determined how to use the spark-shell to show the field names but it's ugly and does not include the type

我已经确定了如何使用 spark-shell 来显示字段名称，但它很难看，并且不包含类型

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

println(sqlContext.parquetFile(path))

prints:

印刷：

ParquetTableScan [cust_id#114,blar_field#115,blar_field2#116], (ParquetRelation /blar/blar), None

Answer 1

回答by BAR

You should be able to do this:

你应该能够做到这一点：

sqlContext.read.parquet(path).printSchema()

From Spark docs:

来自Spark 文档：

// Print the schema in a tree format
df.printSchema()
// root
// |-- age: long (nullable = true)
// |-- name: string (nullable = true)

Answer 2

回答by samthebest

OK I think I have an OK way of doing it, just peek the first row to infer the scheme. (Though not sure just how elegant this is, what if it happens to be empty?? I'm sure there has to be a better solution)

好的，我想我有一个不错的方法，只需查看第一行即可推断出方案。（虽然不确定这有多优雅，但如果它碰巧是空的怎么办？我相信必须有更好的解决方案）

sqlContext.parquetFile(p).first()

At some point prints:

在某些时候打印：

{
  optional binary cust_id;
  optional binary blar;
  optional double foo;
}
 fileSchema: message schema {
  optional binary cust_id;
  optional binary blar;
  optional double foo;
}

Answer 3

回答by sp00n3r

The result of parquetFile() is a SchemaRDD (1.2) or DataFrame (1.3) which have the .printSchema() method.

parquetFile() 的结果是具有 .printSchema() 方法的 SchemaRDD (1.2) 或 DataFrame (1.3)。

scala 如何从命令行或 spark shell 显示拼花文件的方案（包括类型）？

提问by samthebest

回答by BAR

回答by samthebest

回答by sp00n3r

相关推荐

最近更新

标签

scala 如何从命令行或 spark shell 显示拼花文件的方案（包括类型）？

提问by samthebest

回答by BAR

回答by samthebest

回答by sp00n3r

相关推荐

如何使用 Scala 从 Spark 中的列表或数组创建行

将 Scala 类作为参数传递？

scala Spark：读取文本文件后的重新分区策略

scala 如何使用 Apache Spark 计算准确的中位数？

相关推荐

最近更新

标签