Python Pyspark 创建时间戳列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45469438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:03:19  来源:igfitidea点击:

Pyspark Creating timestamp column

pythondatetimepyspark

提问by Naveen Srikanth

I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help

我正在使用火花 2.1.0。我无法在 pyspark 中创建时间戳列我正在使用下面的代码片段。请帮忙

df=df.withColumn('Age',lit(datetime.now()))

I am getting

我正进入(状态

assertion error:col should be Column

断言错误:col 应该是 Column

Please help

请帮忙

回答by balalaika

I am not sure for 2.1.0, on 2.2.1 at least you can just:

我不确定 2.1.0,至少在 2.2.1 上你可以:

from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())

Hope it helps!

希望能帮助到你!

回答by Ankush Singh

Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.

假设您的代码片段中有数据框,并且您希望所有行都有相同的时间戳。

Let me create some dummy dataframe.

让我创建一些虚拟数据框。

>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)

>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>

>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>> new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)

回答by Nikhil Gupta

Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code

添加到巴拉莱卡,如果有人,像我一样只想添加日期,而不是时间,那么他可以按照下面的代码

from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())

Hope this helps

希望这可以帮助