scala Spark 2.2 非法模式组件：XXX java.lang.IllegalArgumentException：非法模式组件：XXX

Question

提问by Lee

I'm trying to upgrade from Spark 2.1 to 2.2. When I try to read or write a dataframe to a location (CSV or JSON) I am receiving this error:

我正在尝试从 Spark 2.1 升级到 2.2。当我尝试将数据帧读取或写入某个位置（CSV 或 JSON）时，我收到此错误：

Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:81)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:43)
at org.apache.spark.sql.execution.datasources.json.JsonFileFormat.inferSchema(JsonFileFormat.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:177)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun.apply(DataSource.scala:177)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:176)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:333)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:279)

I am not setting a default value for dateFormat, so I'm not understanding where it is coming from.

我没有为 dateFormat 设置默认值，所以我不明白它来自哪里。

spark.createDataFrame(objects.map((o) => MyObject(t.source, t.table, o.partition, o.offset, d)))
    .coalesce(1)
    .write
    .mode(SaveMode.Append)
    .partitionBy("source", "table")
    .json(path)

I still get the error with this:

我仍然收到以下错误：

import org.apache.spark.sql.{SaveMode, SparkSession}
val spark = SparkSession.builder.appName("Spark2.2Test").master("local").getOrCreate()
import spark.implicits._
val agesRows = List(Person("alice", 35), Person("bob", 10), Person("jill", 24))
val df = spark.createDataFrame(agesRows).toDF();

df.printSchema
df.show

df.write.mode(SaveMode.Overwrite).csv("my.csv")

Here is the schema: root |-- name: string (nullable = true) |-- age: long (nullable = false)

这是架构：root |-- name: string (nullable = true) |-- age: long (nullable = false)

Answer 1

回答by Lee

I found the answer.

我找到了答案。

The default for the timestampFormat is yyyy-MM-dd'T'HH:mm:ss.SSSXXXwhich is an illegal argument. It needs to be set when you are writing the dataframe out.

timestampFormat 的默认值yyyy-MM-dd'T'HH:mm:ss.SSSXXX是非法参数。当您写出数据帧时需要设置它。

The fix is to change that to ZZ which will include the timezone.

解决方法是将其更改为包含时区的 ZZ。

df.write
.option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ")
.mode(SaveMode.Overwrite)
.csv("my.csv")

Answer 2

回答by Mauro Pirrone

Ensure you are using the correct version of commons-lang3

确保您使用的是正确版本的 commons-lang3

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-lang3</artifactId>
  <version>3.5</version>
</dependency>

Answer 3

回答by danzhi

Use commons-lang3-3.5.jar fixed the original error. I didn't check the source code to tell why but it is no surprising as the original exception happens at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282). I also noticed the file /usr/lib/spark/jars/commons-lang3-3.5.jar (on an EMR cluster instance) which also suggest 3.5 is the consistent version to use.

使用 commons-lang3-3.5.jar 修复了原来的错误。我没有检查源代码来说明原因，但这并不奇怪，因为原始异常发生在 org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)。我还注意到文件 /usr/lib/spark/jars/commons-lang3-3.5.jar（在 EMR 集群实例上），它也表明 3.5 是要使用的一致版本。

Answer 4

回答by Zhang Xujie

I also met this problem, and my solution(reason) is: Because I put a wrong format json file to hdfs. After I put a correct text or json file, it can go correctly.

我也遇到了这个问题，我的解决方案（原因）是：因为我把格式错误的json文件放到了hdfs中。在我输入正确的文本或json文件后，它可以正确运行。

scala Spark 2.2 非法模式组件：XXX java.lang.IllegalArgumentException：非法模式组件：XXX

提问by Lee

回答by Lee

回答by Mauro Pirrone

回答by danzhi

回答by Zhang Xujie

相关推荐

最近更新

标签

scala Spark 2.2 非法模式组件：XXX java.lang.IllegalArgumentException：非法模式组件：XXX

提问by Lee

回答by Lee

回答by Mauro Pirrone

回答by danzhi

回答by Zhang Xujie

相关推荐

scala 将 UDF 应用于 Spark Dataframe 中的多列

scala Spark：仅当路径存在时才读取文件

scala 如何使用scala读取spark中的json文件？

scala 在 Spark 中读取 Avro 文件

相关推荐

最近更新

标签