scala Spark 2.2 非法模式组件:XXX java.lang.IllegalArgumentException:非法模式组件:XXX

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46429616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:28:13  来源:igfitidea点击:

Spark 2.2 Illegal pattern component: XXX java.lang.IllegalArgumentException: Illegal pattern component: XXX

scalaapache-sparkspark-dataframe

提问by Lee

I'm trying to upgrade from Spark 2.1 to 2.2. When I try to read or write a dataframe to a location (CSV or JSON) I am receiving this error:

我正在尝试从 Spark 2.1 升级到 2.2。当我尝试将数据帧读取或写入某个位置(CSV 或 JSON)时,我收到此错误:

Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:81)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:43)
at org.apache.spark.sql.execution.datasources.json.JsonFileFormat.inferSchema(JsonFileFormat.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:177)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:177)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:176)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:333)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:279)

I am not setting a default value for dateFormat, so I'm not understanding where it is coming from.

我没有为 dateFormat 设置默认值,所以我不明白它来自哪里。

spark.createDataFrame(objects.map((o) => MyObject(t.source, t.table, o.partition, o.offset, d)))
    .coalesce(1)
    .write
    .mode(SaveMode.Append)
    .partitionBy("source", "table")
    .json(path)

I still get the error with this:

我仍然收到以下错误:

import org.apache.spark.sql.{SaveMode, SparkSession}
val spark = SparkSession.builder.appName("Spark2.2Test").master("local").getOrCreate()
import spark.implicits._
val agesRows = List(Person("alice", 35), Person("bob", 10), Person("jill", 24))
val df = spark.createDataFrame(agesRows).toDF();

df.printSchema
df.show

df.write.mode(SaveMode.Overwrite).csv("my.csv")

Here is the schema: root |-- name: string (nullable = true) |-- age: long (nullable = false)

这是架构:root |-- name: string (nullable = true) |-- age: long (nullable = false)

回答by Lee

I found the answer.

我找到了答案。

The default for the timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXXwhich is an illegal argument. It needs to be set when you are writing the dataframe out.

timestampFormat 的默认值yyyy-MM-dd'T'HH:mm:ss.SSSXXX是非法参数。当您写出数据帧时需要设置它。

The fix is to change that to ZZ which will include the timezone.

解决方法是将其更改为包含时区的 ZZ。

df.write
.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ")
.mode(SaveMode.Overwrite)
.csv("my.csv")

回答by Mauro Pirrone

Ensure you are using the correct version of commons-lang3

确保您使用的是正确版本的 commons-lang3

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-lang3</artifactId>
  <version>3.5</version>
</dependency>

回答by danzhi

Use commons-lang3-3.5.jar fixed the original error. I didn't check the source code to tell why but it is no surprising as the original exception happens at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282). I also noticed the file /usr/lib/spark/jars/commons-lang3-3.5.jar (on an EMR cluster instance) which also suggest 3.5 is the consistent version to use.

使用 commons-lang3-3.5.jar 修复了原来的错误。我没有检查源代码来说明原因,但这并不奇怪,因为原始异常发生在 org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)。我还注意到文件 /usr/lib/spark/jars/commons-lang3-3.5.jar(在 EMR 集群实例上),它也表明 3.5 是要使用的一致版本。

回答by Zhang Xujie

I also met this problem, and my solution(reason) is: Because I put a wrong format json file to hdfs. After I put a correct text or json file, it can go correctly.

我也遇到了这个问题,我的解决方案(原因)是:因为我把格式错误的json文件放到了hdfs中。在我输入正确的文本或json文件后,它可以正确运行。