scala 替换 Spark DataFrame 中的空值

Question

提问by Gavin Niu

I saw a solution here but when I tried it doesn't work for me.

我在这里看到了一个解决方案，但是当我尝试时它对我不起作用。

First I import a cars.csv file :

首先我导入一个cars.csv文件：

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

如下所示：

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

然后我这样做：

df.na.fill("e",Seq("blank"))

But the null values didn't change.

但是空值没有改变。

Can anyone help me ?

谁能帮我？

Answer 1

回答by eliasah

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame dfthat you have defined earlier.

这基本上很简单。您需要创建一个新的DataFrame. 我正在使用DataFrame df您之前定义的。

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrameto a new value.

DataFrames 是不可变结构。每次执行需要存储的转换时，您都需要将转换DataFrame后的值影响为新值。

Answer 2

回答by Bhagwati Malav

you can achieve same in java this way

您可以通过这种方式在 Java 中实现相同的目标

Dataset<Row> filteredData = dataset.na().fill(0);

scala 替换 Spark DataFrame 中的空值

提问by Gavin Niu

回答by eliasah

回答by Bhagwati Malav

相关推荐

最近更新

标签

scala 替换 Spark DataFrame 中的空值

提问by Gavin Niu

回答by eliasah

回答by Bhagwati Malav

相关推荐

scala 在大型数据集上运行 spark 时出现“sparkContext 已关闭”

scala 激发多个上下文

可供 Jupyter/IPython 选择的众多 Spark/Scala 内核中的哪一个？

scala 使用 SBT 包在 JAR 中包含依赖项

相关推荐

最近更新

标签