scala 替换 Spark DataFrame 中的空值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33376571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace null values in Spark DataFrame
提问by Gavin Niu
I saw a solution here but when I tried it doesn't work for me.
我在这里看到了一个解决方案,但是当我尝试时它对我不起作用。
First I import a cars.csv file :
首先我导入一个cars.csv文件:
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load("/usr/local/spark/cars.csv")
Which looks like the following :
如下所示:
+----+-----+-----+--------------------+-----+
|year| make|model| comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla| S| No comment| |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null| null|
Then I do this :
然后我这样做:
df.na.fill("e",Seq("blank"))
But the null values didn't change.
但是空值没有改变。
Can anyone help me ?
谁能帮我 ?
回答by eliasah
This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame dfthat you have defined earlier.
这基本上很简单。您需要创建一个新的DataFrame. 我正在使用DataFrame df您之前定义的 。
val newDf = df.na.fill("e",Seq("blank"))
DataFrames are immutable structures.
Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrameto a new value.
DataFrames 是不可变结构。每次执行需要存储的转换时,您都需要将转换DataFrame后的值影响为新值。
回答by Bhagwati Malav
you can achieve same in java this way
您可以通过这种方式在 Java 中实现相同的目标
Dataset<Row> filteredData = dataset.na().fill(0);

