scala Apache Spark,将“CASE WHEN ... ELSE ...”计算列添加到现有数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30783517/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame
提问by Leonardo Biagioli
I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs. Starting dataframe:
我正在尝试使用 Scala API 将“CASE WHEN ... ELSE ...”计算列添加到现有 DataFrame 中。起始数据帧:
color
Red
Green
Blue
Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):
所需的数据框(SQL 语法:CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):
color bool
Red 0
Green 1
Blue 0
How should I implement this logic?
我应该如何实现这个逻辑?
回答by Herman
In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:
在即将发布的 SPARK 1.4.0 版本中(应该在接下来的几天内发布)。您可以使用 when/otherwise 语法:
// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")
// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))
If you are using SPARK 1.3.0 you can chose to use a UDF:
如果您使用的是 SPARK 1.3.0,您可以选择使用 UDF:
// Define the UDF
val isGreen = udf((color: String) => {
if (color == "Green") 1
else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))
回答by Robert Chevallier
In Spark 1.5.0: you can also use the SQL syntax expr function
在 Spark 1.5.0 中:您还可以使用 SQL 语法 expr 函数
val df3 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))
or plain spark-sql
或普通的 spark-sql
df.registerTempTable("data")
val df4 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)
回答by ozma
I found this:
我找到了这个:
https://issues.apache.org/jira/browse/SPARK-3813
https://issues.apache.org/jira/browse/SPARK-3813
Worked for me on spark 2.1.0:
在 spark 2.1.0 上对我来说有效:
import sqlContext._
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
rdd.registerTempTable("records")
println("Result of SELECT *:")
sql("SELECT case key when '93' then 'ravi' else key end FROM records").collect()
回答by Ehud Lev
I was looking for that long time so here is example of SPARK 2.1 JAVA with group by- for other java users.
我找了很长时间,所以这里是 SPARK 2.1 JAVA 的示例,为其他 Java 用户提供 group by-。
import static org.apache.spark.sql.functions.*;
//...
Column uniqTrue = col("uniq").equalTo(true);
Column uniqFalse = col("uniq").equalTo(false);
Column testModeFalse = col("testMode").equalTo(false);
Column testModeTrue = col("testMode").equalTo(true);
Dataset<Row> x = basicEventDataset
.groupBy(col(group_field))
.agg(
sum(when((testModeTrue).and(uniqTrue), 1).otherwise(0)).as("tt"),
sum(when((testModeFalse).and(uniqTrue), 1).otherwise(0)).as("ft"),
sum(when((testModeTrue).and(uniqFalse), 1).otherwise(0)).as("tf"),
sum(when((testModeFalse).and(uniqFalse), 1).otherwise(0)).as("ff")
);

