scala 创建具有空/空字段值的新数据框

Question

提问by sshroff

I am creating a new Dataframe from an existing dataframe, but need to add new column ("field1" in below code) in this new DF. How do I do so? Working sample code example will be appreciated.

我正在从现有的数据帧创建一个新的数据帧，但需要在这个新的 DF 中添加新列（下面代码中的“field1”）。我该怎么做？工作示例代码示例将不胜感激。

val edwDf = omniDataFrame 
  .withColumn("field1", callUDF((value: String) => None)) 
  .withColumn("field2",
    callUdf("devicetypeUDF", (omniDataFrame.col("some_field_in_old_df")))) 

edwDf
  .select("field1", "field2")
  .save("odsoutdatafldr", "com.databricks.spark.csv");

Answer 1

回答by zero323

It is possible to use lit(null):

可以使用lit(null)：

import org.apache.spark.sql.functions.{lit, udf}

case class Record(foo: Int, bar: String)
val df = Seq(Record(1, "foo"), Record(2, "bar")).toDF

val dfWithFoobar = df.withColumn("foobar", lit(null: String))

One problem here is that the column type is null:

这里的一个问题是列类型是null：

scala> dfWithFoobar.printSchema
root
 |-- foo: integer (nullable = false)
 |-- bar: string (nullable = true)
 |-- foobar: null (nullable = true)

and it is not retained by the csvwriter. If it is a hard requirement you can cast column to the specific type (lets say String), with either DataType

并且它没有被csv作者保留。如果这是一个硬性要求，您可以将列强制转换为特定类型（比如字符串），使用任一DataType

import org.apache.spark.sql.types.StringType

df.withColumn("foobar", lit(null).cast(StringType))

or string description

或字符串描述

df.withColumn("foobar", lit(null).cast("string"))

or use an UDF like this:

或使用像这样的 UDF：

val getNull = udf(() => None: Option[String]) // Or some other type

df.withColumn("foobar", getNull()).printSchema
root
 |-- foo: integer (nullable = false)
 |-- bar: string (nullable = true)
 |-- foobar: string (nullable = true)

A Python equivalent can be found here: Add an empty column to spark DataFrame

可以在此处找到 Python 等效项：Add a empty column to spark DataFrame

Answer 2

回答by sanyi14ka

Just to extend the perfect answer provided by @zero323, here's a solution which can be used starting from Spark 2.2.0.

只是为了扩展@zero323 提供的完美答案，这里有一个可以从 Spark 2.2.0 开始使用的解决方案。

import org.apache.spark.sql.functions.typedLit

df.withColumn("foobar", typedLit[Option[String]](None)).printSchema
root
 |-- foo: integer (nullable = false)
 |-- bar: string (nullable = true)
 |-- foobar: string (nullable = true)

It's similar to the 3rd solution, but without using any UDF.

它类似于第三个解决方案，但不使用任何 UDF。

scala 创建具有空/空字段值的新数据框

提问by sshroff

回答by zero323

回答by sanyi14ka

相关推荐

最近更新

标签

scala 创建具有空/空字段值的新数据框

提问by sshroff

回答by zero323

回答by sanyi14ka

相关推荐

scala 如何构建高效的Kafka broker健康检查？

scala 如何使用指定的模式创建一个空的 DataFrame？

scala spark检索20多条记录

scala spark DataFrame "as" 方法的使用

相关推荐

最近更新

标签