如何使用 Java 将 Apache spark DataFrame 中的 unix epoch 列转换为 Date?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34626371/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 23:07:54  来源:igfitidea点击:

How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?

javaapache-sparkspark-dataframe

提问by ErhWen Kuo

I have a json data file which contain one property [creationDate] which is unix epoc in "long" number type. The Apache Spark DataFrame schema look like below:

我有一个 json 数据文件,其中包含一个属性 [creationDate],它是“长”数字类型的 unix epoc。Apache Spark DataFrame 架构如下所示:

root 
 |-- creationDate: long (nullable = true) 
 |-- id: long (nullable = true) 
 |-- postTypeId: long (nullable = true)
 |-- tags: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- title: string (nullable = true)
 |-- viewCount: long (nullable = true)

I would like to do some groupBy "creationData_Year" which need to get from "creationDate".

我想做一些需要从“creationDate”获取的 groupBy“creationData_Year”。

What's the easiest way to do this kind of convert in DataFrameusing Java?

使用 Java在DataFrame 中进行这种转换的最简单方法是什么?

回答by ErhWen Kuo

After checking spark dataframe api and sql function, I come out below snippet:

在检查了 spark dataframe api 和 sql 函数后,我得出了以下代码片段:

DateFrame df = sqlContext.read().json("MY_JSON_DATA_FILE");

DataFrame df_DateConverted = df.withColumn("creationDt", from_unixtime(stackoverflow_Tags.col("creationDate").divide(1000)));

The reason why "creationDate" column is divided by "1000" is cause the TimeUnit is different. The orgin "creationDate" is unix epoch in "milli-second", however spark sql "from_unixtime" is designed to handle unix epoch in "second".

“creationDate”列除以“1000”的原因是TimeUnit不同。原点“creationDate”是“毫秒”中的unix纪元,但是spark sql“ from_unixtime”旨在处理“ second”中的unix纪元。

回答by Ray Metz

pyspark converts from Unix epoch milliseconds to dataframe timestamp

pyspark 从 Unix 纪元毫秒转换为数据帧时间戳

df.select(from_unixtime((df.my_date_column.cast('bigint')/1000)).cast('timestamp').alias('my_date_column'))