scala 替换 Spark DataFrame 中的空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33376571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:45:25  来源:igfitidea点击:

Replace null values in Spark DataFrame

scalaapache-sparkdataframe

提问by Gavin Niu

I saw a solution here but when I tried it doesn't work for me.

我在这里看到了一个解决方案,但是当我尝试时它对我不起作用。

First I import a cars.csv file :

首先我导入一个cars.csv文件:

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

如下所示:

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

然后我这样做:

df.na.fill("e",Seq("blank"))

But the null values didn't change.

但是空值没有改变。

Can anyone help me ?

谁能帮我 ?

回答by eliasah

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame dfthat you have defined earlier.

这基本上很简单。您需要创建一个新的DataFrame. 我正在使用DataFrame df您之前定义的 。

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrameto a new value.

DataFrames 是不可变结构。每次执行需要存储的转换时,您都需要将转换DataFrame后的值影响为新值。

回答by Bhagwati Malav

you can achieve same in java this way

您可以通过这种方式在 Java 中实现相同的目标

Dataset<Row> filteredData = dataset.na().fill(0);