Scala：Spark SQL to_date(unix_timestamp) 返回 NULL

Question

提问by Sai Wai Maung

Spark Version: spark-2.0.1-bin-hadoop2.7 Scala: 2.11.8

I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of 2016-10-25. The parameter date_formatincludes string of column names that need to be converted to yyyy-mm-dd format.

我正在将原始 csv 加载到 DataFrame 中。在 csv 中，虽然该列支持日期格式，但它们被写为 20161025 而不是 2016-10-25。该参数date_format包含需要转换为yyyy-mm-dd格式的列名字符串。

In the following code, I first loaded the csv of Date column as StringType via the schema, and then I check if the date_formatis not empty, that is there are columns that need to be converted to Datefrom String, then cast each column using unix_timestampand to_date. However, in the csv_df.show(), the returned rows are all null.

在下面的代码中，我首先通过将 Date 列的 csv 加载为 StringType schema，然后我检查是否date_format不为空，即有需要转换为Datefrom 的String列，然后使用unix_timestamp和强制转换每一列to_date。但是，在中csv_df.show()，返回的行都是null。

def read_csv(csv_source:String, delimiter:String, is_first_line_header:Boolean, 
    schema:StructType, date_format:List[String]): DataFrame = {
    println("|||| Reading CSV Input ||||")

    var csv_df = sqlContext.read
        .format("com.databricks.spark.csv")
        .schema(schema)
        .option("header", is_first_line_header)
        .option("delimiter", delimiter)
        .load(csv_source)
    println("|||| Successfully read CSV. Number of rows -> " + csv_df.count() + " ||||")
    if(date_format.length > 0) {
        for (i <- 0 until date_format.length) {
            csv_df = csv_df.select(to_date(unix_timestamp(
                csv_df(date_format(i)), "yyyy--MM--dd").cast("timestamp")))
            csv_df.show()
        }
    }
    csv_df
}

Returned Top 20 rows:

返回前 20 行：

+-------------------------------------------------------------------------+
|to_date(CAST(unix_timestamp(prom_price_date, YYYY--MM--DD) AS TIMESTAMP))|
+-------------------------------------------------------------------------+
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
|                                                                     null|
+-------------------------------------------------------------------------+

Why am I getting all null?

为什么我得到了所有null？

Answer 1

回答by

To convert yyyyMMddto yyyy-MM-ddyou can:

要转换yyyyMMdd为yyyy-MM-dd您可以：

spark.sql("""SELECT DATE_FORMAT(
  CAST(UNIX_TIMESTAMP('20161025', 'yyyyMMdd') AS TIMESTAMP), 'yyyy-MM-dd'
)""")

with functions:

具有功能：

date_format(unix_timestamp(col, "yyyyMMdd").cast("timestamp"), "yyyy-MM-dd")

Scala：Spark SQL to_date(unix_timestamp) 返回 NULL

提问by Sai Wai Maung

回答by

相关推荐

最近更新

标签

Scala：Spark SQL to_date(unix_timestamp) 返回 NULL

提问by Sai Wai Maung

回答by

相关推荐

scala Spark 2.0 缺少 spark 隐式

scala 在 spark 中对多个 DataFrame 执行连接

在 Scala 中打乱列表

scala 在 Spark Dataframe 中，如何在两个数据框中获取重复记录和不同记录？

相关推荐

最近更新

标签