scala 如何从现有的 SparkContext 创建 SparkSession
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/42935242/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create SparkSession from existing SparkContext
提问by Stefan Repcek
I have a Spark application which using Spark 2.0 new API with SparkSession.
I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContextto my application and initialize SparkSessionusing existing SparkContext. 
我有一个 Spark 应用程序,它使用 Spark 2.0 新 API 和SparkSession. 我正在另一个使用SparkContext. 我想传递SparkContext给我的应用程序并SparkSession使用现有的SparkContext.
However I could not find a way how to do that. I found that SparkSessionconstructor with SparkContextis private so I can't initialize it in that way and builder does not offer any setSparkContextmethod. Do you think there exist some workaround? 
但是,我找不到如何做到这一点的方法。我发现SparkSession构造函数 withSparkContext是私有的,所以我无法以这种方式初始化它,并且 builder 不提供任何setSparkContext方法。你认为存在一些解决方法吗?
采纳答案by Stefan Repcek
Apparently there is no way how to initialize SparkSessionfrom existing SparkContext.
显然没有办法SparkSession从现有的SparkContext.
回答by Partha Sarathy
Like in the above example you cannot create because SparkSession's constructor is private
Instead you can create a SQLContextusing the SparkContext, and later get the sparksession from the sqlcontext like this
就像在上面的例子中你不能创建因为SparkSession的构造函数是私有的相反你可以创建一个SQLContextusing SparkContext,然后像这样从 sqlcontext 获取 sparksession
val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession
Hope this helps
希望这可以帮助
回答by Rishabh
Deriving the SparkSessionobject out of SparkContextor even SparkConfis easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4but this should work in the older 2.xreleases as well):
从或什至很容易推导出SparkSession对象。只是您可能会发现 API 有点复杂。这是一个示例(我使用的是 Spark,但这也适用于旧版本):SparkContextSparkConf2.42.x
// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
Hope that helps!
希望有帮助!
回答by Mostwanted Mani
public JavaSparkContext getSparkContext() 
{
        SparkConf conf = new SparkConf()
                    .setAppName("appName")
                    .setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        return jsc;
}
public  SparkSession getSparkSession()
{
        sparkSession= new SparkSession(getSparkContext().sc());
        return sparkSession;
}
you can also try using builder  
public SparkSession getSparkSession()
{
        SparkConf conf = new SparkConf()
                        .setAppName("appName")
                        .setMaster("local");
       SparkSession sparkSession = SparkSession
                                   .builder()
                                   .config(conf)
                                  .getOrCreate();
        return sparkSession;
}
回答by lostsoul29
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()
回答by Raider Yang
You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.
你会注意到我们使用的是 SparkSession 和 SparkContext,这不是错误。让我们重新审视 Spark 历史的编年史。了解我们从何而来很重要,因为您将在一段时间内听到这些连接对象。
Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.
在 Spark 2.0.0 之前,三个主要的连接对象是 SparkContext、SqlContext 和 HiveContext。SparkContext 对象是与 Spark 执行环境的连接并创建 RDD 等,SQLContext 在 SparkContext 的后台与 SparkSQL 一起工作,而 HiveContext 与 Hive 存储交互。
Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:
Spark 2.0.0 引入了 Datasets/DataFrames 作为主要的分布式数据抽象接口和 SparkSession 对象作为 Spark 执行环境的入口点。相应地,SparkSession 对象位于命名空间 org.apache.spark.sql.SparkSession (Scala) 或 pyspark.sql.sparkSession 中。需要注意的几点如下:
In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...
在 Scala 和 Java 中,数据集作为类型数据形成了主要的数据抽象;但是,对于 Python 和 R(没有编译时类型检查),数据...

