scala 如何从字符串在火花中创建 TimestampType 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45148365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:21:55  来源:igfitidea点击:

How to create TimestampType column in spark from string

scalaapache-spark

提问by a.moussa

I have some datas contained in an Array of String like below (just for exemple):

我有一些包含在字符串数组中的数据,如下所示(仅作为示例):

val myArray = Array("1499955986039", "1499955986051", "1499955986122")

I want to map my list to an array of Timestamp, in order to create an RDD (myRdd) then create a dataframe like this

我想将我的列表映射到一个时间戳数组,以便创建一个 RDD (myRdd) 然后创建一个像这样的数据帧

val df = createdataframe(myRdd, StructType(StructField("myTymeStamp", TimestampType,true)

My question is not how to create the Rdd, but how to replace string by millisecond timestamp. Do you have any idea? Thanks

我的问题不是如何创建 Rdd,而是如何通过毫秒时间戳替换字符串。你有什么主意吗?谢谢

回答by Psidom

Use java.sql.Timestamp:

使用java.sql.Timestamp

val myArray = Array("1499955986039", "1499955986051", "1499955986122")
import java.sql.Timestamp    
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, TimestampType}

val rdd = sc.parallelize(myArray).map(s => Row(new Timestamp(s.toLong)))

val schema = StructType(Array(StructField("myTymeStamp", TimestampType, true)))

spark.createDataFrame(rdd, schema)
// res25: org.apache.spark.sql.DataFrame = [myTymeStamp: timestamp]

回答by ktheitroadalo

You dont need to convert to timestamp before, You just convert to long and you can use schema to convert to tymestamp while creating dataframe as below

您之前不需要转换为时间戳,您只需转换为 long,您就可以在创建数据帧时使用模式转换为 tymestamp,如下所示

import org.apache.spark.sql.Row

val myArray = Array("1499955986039", "1499955986051", "1499955986122")

val myrdd = spark.sparkContext.parallelize(myArray.map(a => Row(a.toLong)))

val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", TimestampType,true))))

Otherwise you can just create a dataframe from String and cast to timestamp later as below

否则,您可以从 String 创建一个数据框并稍后转换为时间戳,如下所示

val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", StringType,true))))

//cast myTymeStamp from String to Long and to timestamp
df.withColumn("myTymeStamp", $"myTymeStamp".cast(LongType).cast(TimestampType))

Hope this helps!

希望这可以帮助!