Scala:Spark SQL to_date(unix_timestamp) 返回 NULL
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40433065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scala: Spark SQL to_date(unix_timestamp) returning NULL
提问by Sai Wai Maung
Spark Version: spark-2.0.1-bin-hadoop2.7
Scala: 2.11.8
Spark Version: spark-2.0.1-bin-hadoop2.7
Scala: 2.11.8
I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of 2016-10-25. The parameter date_formatincludes string of column names that need to be converted to yyyy-mm-dd format.
我正在将原始 csv 加载到 DataFrame 中。在 csv 中,虽然该列支持日期格式,但它们被写为 20161025 而不是 2016-10-25。该参数date_format包含需要转换为yyyy-mm-dd格式的列名字符串。
In the following code, I first loaded the csv of Date column as StringType via the schema, and then I check if the date_formatis not empty, that is there are columns that need to be converted to Datefrom String, then cast each column using unix_timestampand to_date. However, in the csv_df.show(), the returned rows are all null.
在下面的代码中,我首先通过 将 Date 列的 csv 加载为 StringType schema,然后我检查 是否date_format不为空,即有需要转换为Datefrom 的String列,然后使用unix_timestamp和强制转换每一列to_date。但是,在 中csv_df.show(),返回的行都是null。
def read_csv(csv_source:String, delimiter:String, is_first_line_header:Boolean,
schema:StructType, date_format:List[String]): DataFrame = {
println("|||| Reading CSV Input ||||")
var csv_df = sqlContext.read
.format("com.databricks.spark.csv")
.schema(schema)
.option("header", is_first_line_header)
.option("delimiter", delimiter)
.load(csv_source)
println("|||| Successfully read CSV. Number of rows -> " + csv_df.count() + " ||||")
if(date_format.length > 0) {
for (i <- 0 until date_format.length) {
csv_df = csv_df.select(to_date(unix_timestamp(
csv_df(date_format(i)), "yyyy--MM--dd").cast("timestamp")))
csv_df.show()
}
}
csv_df
}
Returned Top 20 rows:
返回前 20 行:
+-------------------------------------------------------------------------+
|to_date(CAST(unix_timestamp(prom_price_date, YYYY--MM--DD) AS TIMESTAMP))|
+-------------------------------------------------------------------------+
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
+-------------------------------------------------------------------------+
Why am I getting all null?
为什么我得到了所有null?
回答by
To convert yyyyMMddto yyyy-MM-ddyou can:
要转换yyyyMMdd为yyyy-MM-dd您可以:
spark.sql("""SELECT DATE_FORMAT(
CAST(UNIX_TIMESTAMP('20161025', 'yyyyMMdd') AS TIMESTAMP), 'yyyy-MM-dd'
)""")
with functions:
具有功能:
date_format(unix_timestamp(col, "yyyyMMdd").cast("timestamp"), "yyyy-MM-dd")

