Python Spark 等效于 IF Then ELSE

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39048229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:48:34  来源:igfitidea点击:

Spark Equivalent of IF Then ELSE

pythonapache-sparkpysparkapache-spark-sql

提问by Baktaawar

I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.

我早些时候在这里看到过这个问题,我从中吸取了教训。但是,我不确定为什么我觉得它应该可以工作时会出错。

I want to create a new column in existing Spark DataFrameby some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.

我想DataFrame通过一些规则在现有 Spark 中创建一个新列。这是我写的。iris_spark 是具有三个不同类别的分类变量 iris_spark 的数据框。

from pyspark.sql import functions as F

iris_spark_df = iris_spark.withColumn(
    "Class", 
   F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))

Throws the following error.

抛出以下错误。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)

Any idea why?

知道为什么吗?

回答by zero323

Correct structure is either:

正确的结构是:

(when(col("iris_class") == 'Iris-setosa', 0)
.when(col("iris_class") == 'Iris-versicolor', 1)
.otherwise(2))

which is equivalent to

这相当于

CASE 
    WHEN (iris_class = 'Iris-setosa') THEN 0
    WHEN (iris_class = 'Iris-versicolor') THEN 1 
    ELSE 2
END

or:

或者:

(when(col("iris_class") == 'Iris-setosa', 0)
    .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
        .otherwise(2)))

which is equivalent to:

这相当于:

CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
     ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1 
               ELSE 2 
          END 
END

with general syntax:

使用通用语法:

when(condition, value).when(...)

or

或者

when(condition, value).otherwise(...)

You probably mixed up things with Hive IFconditional:

你可能把 HiveIF条件搞混了:

IF(condition, if-true, if-false)

which can be used only in raw SQL with Hive support.

它只能在具有 Hive 支持的原始 SQL 中使用。

回答by vj sreenivasan

Conditional statement In Spark

Spark中的条件语句

  • Using “when otherwise” on DataFrame
  • Using “case when” on DataFrame
  • Using &&and ||operator
  • 在 DataFrame 上使用“ whenelse”
  • 在 DataFrame 上使用“ case when
  • 使用&&|| 操作员


import org.apache.spark.sql.functions.{when, _}
import spark.sqlContext.implicits._

val spark: SparkSession = SparkSession.builder().master("local[1]").appName("SparkByExamples.com").getOrCreate()

val data = List(("James?","","Smith","36636","M",60000),
        ("Michael?","Rose","","40288","M",70000),
        ("Robert?","","Williams","42114","",400000),
        ("Maria?","Anne","Jones","39192","F",500000),
        ("Jen","Mary","Brown","","F",0))

val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
val df = spark.createDataFrame(data).toDF(cols:_*)

1. Using “when otherwise” on DataFrame

1. 在 DataFrame 上使用“when else”

Replace the value of gender with new value

用新值替换性别值

val df1 = df.withColumn("new_gender", when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")
      .otherwise("Unknown"))

val df2 = df.select(col("*"), when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")
      .otherwise("Unknown").alias("new_gender"))

2. Using “case when” on DataFrame

2. 在 DataFrame 上使用“case when”

val df3 = df.withColumn("new_gender",
  expr("case when gender = 'M' then 'Male' " +
                   "when gender = 'F' then 'Female' " +
                   "else 'Unknown' end"))

Alternatively,

或者,

val df4 = df.select(col("*"),
      expr("case when gender = 'M' then 'Male' " +
                       "when gender = 'F' then 'Female' " +
                       "else 'Unknown' end").alias("new_gender"))

3. Using && and || operator

3. 使用 && 和 || 操作员

val dataDF = Seq(
      (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
      )).toDF("id", "code", "amt")
dataDF.withColumn("new_column",
       when(col("code") === "a" || col("code") === "d", "A")
      .when(col("code") === "b" && col("amt") === "4", "B")
      .otherwise("A1"))
      .show()

Output:

输出:

+---+----+---+----------+
| id|code|amt|new_column|
+---+----+---+----------+
| 66|   a|  4|         A|
| 67|   a|  0|         A|
| 70|   b|  4|         B|
| 71|   d|  4|         A|
+---+----+---+----------+

回答by neeraj bhadani

There are different ways you can achieve if-then-else.

有多种方法可以实现 if-then-else。

  1. Using whenfunction in DataFrame API. You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.

  2. exprfunction. Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.

  1. 在 DataFrame API 中使用when函数。您可以在 when 中指定条件列表,也可以在其他情况下指定您需要的值。您也可以以嵌套形式使用此表达式。

  2. expr函数。使用“expr”函数,您可以在 expr 中传递 SQL 表达式。PFB 示例。在这里,我们根据月份列创建新列“季度”。

cond = """case when month > 9 then 'Q4'
            else case when month > 6 then 'Q3'
                else case when month > 3 then 'Q2'
                    else case when month > 0 then 'Q1'
                        end
                    end
                end
            end as quarter"""

newdf = df.withColumn("quarter", expr(cond))
  1. selectExprfunction. We can also use the variant of select function which can take SQL expression. PFB example.
  1. 选择Expr函数。我们还可以使用可以采用 SQL 表达式的 select 函数的变体。PFB 示例。
    cond = """case when month > 9 then 'Q4'
                else case when month > 6 then 'Q3'
                    else case when month > 3 then 'Q2'
                        else case when month > 0 then 'Q1'
                            end
                        end
                    end
                end as quarter"""

    newdf = df.selectExpr("*", cond)

回答by vermaji

you can use this: if(exp1, exp2, exp3)inside spark.sql()where exp1 is condition and if true give me exp2, else give me exp3.

您可以使用此: if(exp1, exp2, exp3)里面spark.sql()其中EXP1是条件,如果真给我EXP2,别人给我EXP3。

now the funny thing with nested if-else is. you need to pass every exp inside

现在有趣的是嵌套的 if-else 是。你需要通过里面的每个 exp

brackets {"()"}
else it will raise error.

example:

例子:

if((1>2), (if (2>3), True, False), (False))