scala 如何从现有的 SparkContext 创建 SparkSession

Question

提问by Stefan Repcek

I have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContextto my application and initialize SparkSessionusing existing SparkContext.

我有一个 Spark 应用程序，它使用 Spark 2.0 新 API 和SparkSession. 我正在另一个使用SparkContext. 我想传递SparkContext给我的应用程序并SparkSession使用现有的SparkContext.

However I could not find a way how to do that. I found that SparkSessionconstructor with SparkContextis private so I can't initialize it in that way and builder does not offer any setSparkContextmethod. Do you think there exist some workaround?

但是，我找不到如何做到这一点的方法。我发现SparkSession构造函数 withSparkContext是私有的，所以我无法以这种方式初始化它，并且 builder 不提供任何setSparkContext方法。你认为存在一些解决方法吗？

Answer 1

采纳答案by Stefan Repcek

Apparently there is no way how to initialize SparkSessionfrom existing SparkContext.

显然没有办法SparkSession从现有的SparkContext.

Answer 2

回答by Partha Sarathy

Like in the above example you cannot create because SparkSession's constructor is private Instead you can create a SQLContextusing the SparkContext, and later get the sparksession from the sqlcontext like this

就像在上面的例子中你不能创建因为SparkSession的构造函数是私有的相反你可以创建一个SQLContextusing SparkContext，然后像这样从 sqlcontext 获取 sparksession

val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession

Hope this helps

希望这可以帮助

Answer 3

回答by Rishabh

Deriving the SparkSessionobject out of SparkContextor even SparkConfis easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4but this should work in the older 2.xreleases as well):

从或什至很容易推导出SparkSession对象。只是您可能会发现 API 有点复杂。这是一个示例（我使用的是 Spark，但这也适用于旧版本）：SparkContextSparkConf2.42.x

// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

Hope that helps!

希望有帮助！

Answer 4

回答by Mostwanted Mani

public JavaSparkContext getSparkContext() 
{
        SparkConf conf = new SparkConf()
                    .setAppName("appName")
                    .setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        return jsc;
}


public  SparkSession getSparkSession()
{
        sparkSession= new SparkSession(getSparkContext().sc());
        return sparkSession;
}


you can also try using builder  

public SparkSession getSparkSession()
{
        SparkConf conf = new SparkConf()
                        .setAppName("appName")
                        .setMaster("local");

       SparkSession sparkSession = SparkSession
                                   .builder()
                                   .config(conf)
                                  .getOrCreate();
        return sparkSession;
}

Answer 5

回答by lostsoul29

val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()

Answer 6

回答by Raider Yang

You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.

你会注意到我们使用的是 SparkSession 和 SparkContext，这不是错误。让我们重新审视 Spark 历史的编年史。了解我们从何而来很重要，因为您将在一段时间内听到这些连接对象。

Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.

在 Spark 2.0.0 之前，三个主要的连接对象是 SparkContext、SqlContext 和 HiveContext。SparkContext 对象是与 Spark 执行环境的连接并创建 RDD 等，SQLContext 在 SparkContext 的后台与 SparkSQL 一起工作，而 HiveContext 与 Hive 存储交互。

Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:

Spark 2.0.0 引入了 Datasets/DataFrames 作为主要的分布式数据抽象接口和 SparkSession 对象作为 Spark 执行环境的入口点。相应地，SparkSession 对象位于命名空间 org.apache.spark.sql.SparkSession (Scala) 或 pyspark.sql.sparkSession 中。需要注意的几点如下：

In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...

在 Scala 和 Java 中，数据集作为类型数据形成了主要的数据抽象；但是，对于 Python 和 R（没有编译时类型检查），数据...

https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785889271/4/ch04lvl1sec31/sparksession-versus-sparkcontext

scala 如何从现有的 SparkContext 创建 SparkSession

提问by Stefan Repcek

采纳答案by Stefan Repcek

回答by Partha Sarathy

回答by Rishabh

回答by Mostwanted Mani

回答by lostsoul29

回答by Raider Yang

相关推荐

最近更新

标签

scala 如何从现有的 SparkContext 创建 SparkSession

提问by Stefan Repcek

采纳答案by Stefan Repcek

回答by Partha Sarathy

回答by Rishabh

回答by Mostwanted Mani

回答by lostsoul29

回答by Raider Yang

相关推荐

scala 如何使用值对我的火花结果元组进行降序排序

scala java.lang.RuntimeException: java.lang.String 不是 bigint 或 int 模式的有效外部类型

scala sbt 和公司代理 - SunCertPathBuilderException

scala 将 DataFrame 保存为 CSV 时指定文件名

相关推荐

最近更新

标签