将 scala 列表转换为 DataFrame 或 DataSet
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39397652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert scala list to DataFrame or DataSet
提问by Leo
I am new to Scala. I am trying to convert a scala list (which is holding the results of some calculated data on a source DataFrame) to Dataframe or Dataset. I am not finding any direct method to do that. However, I have tried the following process to convert my list to DataSet but it seems not working. I am providing the 3 situations below.
我是 Scala 的新手。我正在尝试将 Scala 列表(它保存源 DataFrame 上某些计算数据的结果)转换为 Dataframe 或 Dataset。我没有找到任何直接的方法来做到这一点。但是,我尝试了以下过程将我的列表转换为 DataSet,但它似乎不起作用。我提供以下 3 种情况。
Can someone please provide me some ray of hope, how to do this conversion? Thanks.
有人可以给我一些希望,如何进行这种转换?谢谢。
import org.apache.spark.sql.{DataFrame, Row, SQLContext, DataFrameReader}
import java.sql.{Connection, DriverManager, ResultSet, Timestamp}
import scala.collection._
case class TestPerson(name: String, age: Long, salary: Double)
var tom = new TestPerson("Tom Hanks",37,35.5)
var sam = new TestPerson("Sam Smith",40,40.5)
val PersonList = mutable.MutableList[TestPerson]()
//Adding data in list
PersonList += tom
PersonList += sam
//Situation 1: Trying to create dataset from List of objects:- Result:Error
//Throwing error
var personDS = Seq(PersonList).toDS()
/*
ERROR:
error: Unable to find encoder for type stored in a Dataset. Primitive types
(Int, String, etc) and Product types (case classes) are supported by
importing sqlContext.implicits._ Support for serializing other types will
be added in future releases.
var personDS = Seq(PersonList).toDS()
*/
//Situation 2: Trying to add data 1-by-1 :- Result: not working as desired.
the last record overwriting any existing data in the DS
var personDS = Seq(tom).toDS()
personDS = Seq(sam).toDS()
personDS += sam //not working. throwing error
//Situation 3: Working. However, I am having consolidated data in the list
which I want to convert to DS; if I loop the results of the list in comma
separated values and then pass that here, it will work but will create an
extra loop in the code, which I want to avoid.
var personDS = Seq(tom,sam).toDS()
scala> personDS.show()
+---------+---+------+
| name|age|salary|
+---------+---+------+
|Tom Hanks| 37| 35.5|
|Sam Smith| 40| 40.5|
+---------+---+------+
回答by Ajeet Shah
Try without Seq:
尝试没有Seq:
case class TestPerson(name: String, age: Long, salary: Double)
val tom = TestPerson("Tom Hanks",37,35.5)
val sam = TestPerson("Sam Smith",40,40.5)
val PersonList = mutable.MutableList[TestPerson]()
PersonList += tom
PersonList += sam
val personDS = PersonList.toDS()
println(personDS.getClass)
personDS.show()
val personDF = PersonList.toDF()
println(personDF.getClass)
personDF.show()
personDF.select("name", "age").show()
Output:
输出:
class org.apache.spark.sql.Dataset
+---------+---+------+
| name|age|salary|
+---------+---+------+
|Tom Hanks| 37| 35.5|
|Sam Smith| 40| 40.5|
+---------+---+------+
class org.apache.spark.sql.DataFrame
+---------+---+------+
| name|age|salary|
+---------+---+------+
|Tom Hanks| 37| 35.5|
|Sam Smith| 40| 40.5|
+---------+---+------+
+---------+---+
| name|age|
+---------+---+
|Tom Hanks| 37|
|Sam Smith| 40|
+---------+---+
Also, make sure to move the declaration of the case class TestPersonoutside the scope of your object.
另外,请确保将 case 类的声明移到TestPersonobject 的范围之外。

