Python Spark 等效于 IF Then ELSE
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39048229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark Equivalent of IF Then ELSE
提问by Baktaawar
I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.
我早些时候在这里看到过这个问题,我从中吸取了教训。但是,我不确定为什么我觉得它应该可以工作时会出错。
I want to create a new column in existing Spark DataFrame
by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.
我想DataFrame
通过一些规则在现有 Spark 中创建一个新列。这是我写的。iris_spark 是具有三个不同类别的分类变量 iris_spark 的数据框。
from pyspark.sql import functions as F
iris_spark_df = iris_spark.withColumn(
"Class",
F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))
Throws the following error.
抛出以下错误。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))
TypeError: when() takes exactly 2 arguments (3 given)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))
TypeError: when() takes exactly 2 arguments (3 given)
Any idea why?
知道为什么吗?
回答by zero323
Correct structure is either:
正确的结构是:
(when(col("iris_class") == 'Iris-setosa', 0)
.when(col("iris_class") == 'Iris-versicolor', 1)
.otherwise(2))
which is equivalent to
这相当于
CASE
WHEN (iris_class = 'Iris-setosa') THEN 0
WHEN (iris_class = 'Iris-versicolor') THEN 1
ELSE 2
END
or:
或者:
(when(col("iris_class") == 'Iris-setosa', 0)
.otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
.otherwise(2)))
which is equivalent to:
这相当于:
CASE WHEN (iris_class = 'Iris-setosa') THEN 0
ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
ELSE 2
END
END
with general syntax:
使用通用语法:
when(condition, value).when(...)
or
或者
when(condition, value).otherwise(...)
You probably mixed up things with Hive IF
conditional:
你可能把 HiveIF
条件搞混了:
IF(condition, if-true, if-false)
which can be used only in raw SQL with Hive support.
它只能在具有 Hive 支持的原始 SQL 中使用。
回答by vj sreenivasan
Conditional statement In Spark
Spark中的条件语句
- Using “when otherwise” on DataFrame
- Using “case when” on DataFrame
- Using &&and ||operator
- 在 DataFrame 上使用“ whenelse”
- 在 DataFrame 上使用“ case when”
- 使用&&和|| 操作员
import org.apache.spark.sql.functions.{when, _}
import spark.sqlContext.implicits._
val spark: SparkSession = SparkSession.builder().master("local[1]").appName("SparkByExamples.com").getOrCreate()
val data = List(("James?","","Smith","36636","M",60000),
("Michael?","Rose","","40288","M",70000),
("Robert?","","Williams","42114","",400000),
("Maria?","Anne","Jones","39192","F",500000),
("Jen","Mary","Brown","","F",0))
val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
val df = spark.createDataFrame(data).toDF(cols:_*)
1. Using “when otherwise” on DataFrame
1. 在 DataFrame 上使用“when else”
Replace the value of gender with new value
用新值替换性别值
val df1 = df.withColumn("new_gender", when(col("gender") === "M","Male")
.when(col("gender") === "F","Female")
.otherwise("Unknown"))
val df2 = df.select(col("*"), when(col("gender") === "M","Male")
.when(col("gender") === "F","Female")
.otherwise("Unknown").alias("new_gender"))
2. Using “case when” on DataFrame
2. 在 DataFrame 上使用“case when”
val df3 = df.withColumn("new_gender",
expr("case when gender = 'M' then 'Male' " +
"when gender = 'F' then 'Female' " +
"else 'Unknown' end"))
Alternatively,
或者,
val df4 = df.select(col("*"),
expr("case when gender = 'M' then 'Male' " +
"when gender = 'F' then 'Female' " +
"else 'Unknown' end").alias("new_gender"))
3. Using && and || operator
3. 使用 && 和 || 操作员
val dataDF = Seq(
(66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
)).toDF("id", "code", "amt")
dataDF.withColumn("new_column",
when(col("code") === "a" || col("code") === "d", "A")
.when(col("code") === "b" && col("amt") === "4", "B")
.otherwise("A1"))
.show()
Output:
输出:
+---+----+---+----------+
| id|code|amt|new_column|
+---+----+---+----------+
| 66| a| 4| A|
| 67| a| 0| A|
| 70| b| 4| B|
| 71| d| 4| A|
+---+----+---+----------+
回答by neeraj bhadani
There are different ways you can achieve if-then-else.
有多种方法可以实现 if-then-else。
Using whenfunction in DataFrame API. You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.
exprfunction. Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.
在 DataFrame API 中使用when函数。您可以在 when 中指定条件列表,也可以在其他情况下指定您需要的值。您也可以以嵌套形式使用此表达式。
expr函数。使用“expr”函数,您可以在 expr 中传递 SQL 表达式。PFB 示例。在这里,我们根据月份列创建新列“季度”。
cond = """case when month > 9 then 'Q4'
else case when month > 6 then 'Q3'
else case when month > 3 then 'Q2'
else case when month > 0 then 'Q1'
end
end
end
end as quarter"""
newdf = df.withColumn("quarter", expr(cond))
- selectExprfunction. We can also use the variant of select function which can take SQL expression. PFB example.
- 选择Expr函数。我们还可以使用可以采用 SQL 表达式的 select 函数的变体。PFB 示例。
cond = """case when month > 9 then 'Q4'
else case when month > 6 then 'Q3'
else case when month > 3 then 'Q2'
else case when month > 0 then 'Q1'
end
end
end
end as quarter"""
newdf = df.selectExpr("*", cond)
回答by vermaji
you can use this:
if(exp1, exp2, exp3)
inside spark.sql()
where exp1 is condition and if true give me exp2, else give me exp3.
您可以使用此:
if(exp1, exp2, exp3)
里面spark.sql()
其中EXP1是条件,如果真给我EXP2,别人给我EXP3。
now the funny thing with nested if-else is. you need to pass every exp inside
现在有趣的是嵌套的 if-else 是。你需要通过里面的每个 exp
brackets {"()"}
else it will raise error.
example:
例子:
if((1>2), (if (2>3), True, False), (False))