scala Apache Spark，将“CASE WHEN ... ELSE ...”计算列添加到现有数据帧

Question

提问by Leonardo Biagioli

I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs. Starting dataframe:

我正在尝试使用 Scala API 将“CASE WHEN ... ELSE ...”计算列添加到现有 DataFrame 中。起始数据帧：

color
Red
Green
Blue

Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):

所需的数据框（SQL 语法：CASE WHEN color == Green THEN 1 ELSE 0 END AS bool）：

color bool
Red   0
Green 1
Blue  0

How should I implement this logic?

我应该如何实现这个逻辑？

Answer 1

回答by Herman

In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:

在即将发布的 SPARK 1.4.0 版本中（应该在接下来的几天内发布）。您可以使用 when/otherwise 语法：

// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")

// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

If you are using SPARK 1.3.0 you can chose to use a UDF:

如果您使用的是 SPARK 1.3.0，您可以选择使用 UDF：

// Define the UDF
val isGreen = udf((color: String) => {
  if (color == "Green") 1
  else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

Answer 2

回答by Robert Chevallier

In Spark 1.5.0: you can also use the SQL syntax expr function

在 Spark 1.5.0 中：您还可以使用 SQL 语法 expr 函数

val df3 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))

or plain spark-sql

或普通的 spark-sql

df.registerTempTable("data")
val df4 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)

Answer 3

回答by ozma

I found this:

我找到了这个：

https://issues.apache.org/jira/browse/SPARK-3813

Worked for me on spark 2.1.0:

在 spark 2.1.0 上对我来说有效：

import sqlContext._
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
rdd.registerTempTable("records")
println("Result of SELECT *:")
sql("SELECT case key when '93' then 'ravi' else key end FROM records").collect()

Answer 4

回答by Ehud Lev

I was looking for that long time so here is example of SPARK 2.1 JAVA with group by- for other java users.

我找了很长时间，所以这里是 SPARK 2.1 JAVA 的示例，为其他 Java 用户提供 group by-。

import static org.apache.spark.sql.functions.*;
 //...
    Column uniqTrue = col("uniq").equalTo(true);
    Column uniqFalse = col("uniq").equalTo(false);

    Column testModeFalse = col("testMode").equalTo(false);
    Column testModeTrue = col("testMode").equalTo(true);

    Dataset<Row> x = basicEventDataset
            .groupBy(col(group_field))
            .agg(
                    sum(when((testModeTrue).and(uniqTrue), 1).otherwise(0)).as("tt"),
                    sum(when((testModeFalse).and(uniqTrue), 1).otherwise(0)).as("ft"),
                    sum(when((testModeTrue).and(uniqFalse), 1).otherwise(0)).as("tf"),
                    sum(when((testModeFalse).and(uniqFalse), 1).otherwise(0)).as("ff")
            );

scala Apache Spark，将“CASE WHEN ... ELSE ...”计算列添加到现有数据帧

提问by Leonardo Biagioli

回答by Herman

回答by Robert Chevallier

回答by ozma

回答by Ehud Lev

相关推荐

最近更新

标签

scala Apache Spark，将“CASE WHEN ... ELSE ...”计算列添加到现有数据帧

提问by Leonardo Biagioli

回答by Herman

回答by Robert Chevallier

回答by ozma

回答by Ehud Lev

相关推荐

迭代日期范围（scala 方式）

scala 在 Spark Dataframe 中使用函数创建新列

Scala 如何通过索引获取子列表

scala 如何旋转 Spark DataFrame？

相关推荐

最近更新

标签