Python 在 Pyspark SQL 中你需要在哪里使用 lit()？

Question

提问by flybonzai

I'm trying to make sense of where you need to use a litvalue, which is defined as a literal columnin the documentation.

我试图弄清楚您需要在哪里使用lit值，该值literal column在文档中定义为 a 。

Take for example this udf, which returns the index of a SQL column array:

以 this 为例udf，它返回一个 SQL 列数组的索引：

def find_index(column, index):
    return column[index]

If I were to pass an integer into this I would get an error. I would need to pass a lit(n)value into the udf to get the correct index of an array.

如果我将一个整数传递给这个，我会得到一个错误。我需要将一个lit(n)值传递给udf 以获得数组的正确索引。

Is there a place I can better learn the hard and fast rules of when to use litand possibly colas well?

有没有什么地方可以让我更好地学习何时使用lit以及可能使用的硬性规则col？

Answer 1

回答by zero323

To keep it simple you need a Column(can be a one created using litbut it is not the only option) when JVM counterpart expects a column and there is no internal conversion in a Python wrapper or you wan to call a Columnspecific method.

为了简单起见，当 JVM 对应物需要一个列并且 Python 包装器中没有内部转换或者您想调用特定方法时，您需要一个Column（可以是一个使用创建的，lit但它不是唯一的选择）Column。

In the first case the only strict rule is the on that applies to UDFs. UDF (Python or JVM) can be called only with arguments which are of Columntype. It also typically applies to functions from pyspark.sql.functions. In other cases it is always best to check documentation and docs string firsts and if it is not sufficient docs of a corresponding Scala counterpart.

在第一种情况下，唯一严格的规则是适用于 UDF 的 on。UDF（Python 或 JVM）只能使用Column类型的参数调用。它通常也适用于pyspark.sql.functions. 在其他情况下，最好先检查文档和文档字符串，如果没有足够的 Scala 对应文档。

In the second case rules are simple. If you for example want to compare a column to a value then value has to be on the RHS:

在第二种情况下，规则很简单。例如，如果您想将列与值进行比较，则值必须位于 RHS 上：

col("foo") > 0  # OK

or value has to be wrapped with literal:

或值必须用文字包装：

lit(0) < col("foo")  # OK

In Python many operators (<, ==, <=, &, |, +, -, *, /) can use non column object on the LHS:

在 Python 中，许多运算符（<, ==, <=, &, |, +, -, *, /）可以在 LHS 上使用非列对象：

0 < col("foo")

but such applications are not supported in Scala.

但 Scala 不支持此类应用程序。

It goes without saying that you have to use litif you want to access any of the pyspark.sql.Columnmethodstreating standard Python scalar as a constant column. For example you'll need

不用说，lit如果您想访问任何pyspark.sql.Column将标准 Python 标量视为常量列的方法，则必须使用它。例如你需要

c = lit(1)

not

不是

~~c = 1~~
~~c = 1~~

to

到

c.between(0, 3)  # type: pyspark.sql.Column

Answer 2

回答by Megha Jaiswal

simple example could be:

简单的例子可能是：

df.withColumn("columnName", lit(Column_Value ))

ex:

前任：

df = df.withColumn("Today's Date", lit(datetime.now()))

But first import library:

但首先导入库：

from pyspark.sql.functions import lit

Python 在 Pyspark SQL 中你需要在哪里使用 lit()？

提问by flybonzai

回答by zero323

回答by Megha Jaiswal

相关推荐

最近更新

标签

Python 在 Pyspark SQL 中你需要在哪里使用 lit()？

提问by flybonzai

回答by zero323

回答by Megha Jaiswal

相关推荐

VSCode：如何使用参数调试 Python 脚本

Python 如何从 .t​​xt 文件中删除空行

Python 获取 TypeError：只能将 str（而不是“int”）连接到 str

Python 熊猫从字符串中提取数字

相关推荐

最近更新

标签

Python 如何从 .txt 文件中删除空行