scala java.lang.RuntimeException: java.lang.String 不是 bigint 或 int 模式的有效外部类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/41970773/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java.lang.RuntimeException: java.lang.String is not a valid external type for schema of bigint or int
提问by Naren
I am reading schema of the data frame from a text file. The file looks like
我正在从文本文件中读取数据框的架构。该文件看起来像
id,1,bigint
price,2,bigint
sqft,3,bigint
zip_id,4,int
name,5,string
and I am mapping parsed data types to Spark Sql datatypes.The code for creating data frame is -
我正在将解析的数据类型映射到 Spark Sql 数据类型。创建数据框的代码是 -
var schemaSt = new ListBuffer[(String,String)]()
// read schema from file
for (line <- Source.fromFile("meta.txt").getLines()) {
  val word = line.split(",")
  schemaSt += ((word(0),word(2)))
}
// map datatypes
val types = Map("int" -> IntegerType, "bigint" -> LongType)
      .withDefault(_ => StringType)
val schemaChanged = schemaSt.map(x => (x._1,types(x._2))
// read data source
val lines = spark.sparkContext.textFile("data source path")
val fields = schemaChanged.map(x => StructField(x._1, x._2, nullable = true)).toList
val schema = StructType(fields)
val rowRDD = lines
  .map(_.split("\t"))
  .map(attributes => Row.fromSeq(attributes))
// Apply the schema to the RDD
val new_df = spark.createDataFrame(rowRDD, schema)
new_df.show(5)
new_df.printSchema()
but the above works only for StringType. For IntegerType and LongType, it is throwing exceptions -
但以上仅适用于 StringType。对于 IntegerType 和 LongType,它抛出异常 -
java.lang.RuntimeException: java.lang.String is not a valid external type for schema of int
java.lang.RuntimeException: java.lang.String 不是 int 架构的有效外部类型
and
和
java.lang.RuntimeException: java.lang.String is not a valid external type for schema of bigint.
java.lang.RuntimeException: java.lang.String 不是 bigint 架构的有效外部类型。
Thanks in advance!
提前致谢!
回答by Vlad.Bachurin
I had the same problem and its cause is the Row.fromSeq()call.
我遇到了同样的问题,其原因是Row.fromSeq()电话。
If it is called on the array of String, the resulting Rowis the row of String's. Which does not match the type of the 2nd column in your schema (bigintor int).
如果在 的数组上调用它String,则结果Row是String's的行。这与您的架构(bigint或int)中第二列的类型不匹配。
In order to get the valid dataframe as a result of Row.fromSeq(values: Seq[Any]), the elements of the valuesargument have to be of the type that corresponds to your schema. 
为了获得有效的数据帧作为 的结果Row.fromSeq(values: Seq[Any]),values参数的元素必须是与您的架构相对应的类型。
回答by ImDarrenG
You are trying to store strings in numerically typed columns.
您正在尝试将字符串存储在数字类型的列中。
You need to cast string encoded numerical data to the appropriate numerical types while parsing.
您需要在解析时将字符串编码的数字数据转换为适当的数字类型。

