使用 Scala 将字符串转换为 Spark 的时间戳

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37349473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:19:13  来源:igfitidea点击:

Convert string to timestamp for Spark using Scala

scalaapache-sparkapache-spark-sqltimestamp

提问by Aissa El Ouafi

I have a dataframe called train, he has the following schema :

我有一个名为 的数据框train,他具有以下架构:

root
|-- date_time: string (nullable = true)
|-- site_name: integer (nullable = true)
|-- posa_continent: integer (nullable = true)

I want to cast the date_timecolumn to timestampand create a new column with the yearvalue extracted from the date_timecolumn.

我想将date_time列转换为timestamp并使用yeardate_time列中提取的值创建一个新列。

To be clear, I have the following dataframe :

需要明确的是,我有以下数据框:

+-------------------+---------+--------------+
|          date_time|site_name|posa_continent|
+-------------------+---------+--------------+
|2014-08-11 07:46:59|        2|             3|
|2014-08-11 08:22:12|        2|             3|
|2015-08-11 08:24:33|        2|             3|
|2016-08-09 18:05:16|        2|             3|
|2011-08-09 18:08:18|        2|             3|
|2009-08-09 18:13:12|        2|             3|
|2014-07-16 09:42:23|        2|             3|
+-------------------+---------+--------------+

I want to get the following dataframe :

我想获得以下数据框:

+-------------------+---------+--------------+--------+
|          date_time|site_name|posa_continent|year    |
+-------------------+---------+--------------+--------+
|2014-08-11 07:46:59|        2|             3|2014    |
|2014-08-11 08:22:12|        2|             3|2014    |
|2015-08-11 08:24:33|        2|             3|2015    |
|2016-08-09 18:05:16|        2|             3|2016    |
|2011-08-09 18:08:18|        2|             3|2011    |
|2009-08-09 18:13:12|        2|             3|2009    |
|2014-07-16 09:42:23|        2|             3|2014    |
+-------------------+---------+--------------+--------+

回答by zero323

Well, if you want to cast the date_timecolumn to timestampand create a new column with the year valuethen do exactly that:

好吧,如果您想将 date_timecolumn 转换为时间戳并创建一个带有年份值的新列,那么请执行以下操作:

import org.apache.spark.sql.functions.year

df
  .withColumn("date_time", $"date_time".cast("timestamp"))  // cast to timestamp
  .withColumn("year", year($"date_time"))  // add year column

回答by Carlos Vilchez

You could map the dataframe to add the year at the end of each row:

您可以映射数据框以在每一行的末尾添加年份:

df.map {
  case Row(col1: String, col2: Int, col3: Int) => (col1, col2, col3, DateTime.parse(col1, DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")).getYear)
}.toDF("date_time", "site_name", "posa_continent", "year").show()