scala 如果 SparkSession 没有关闭会发生什么？

Question

提问by Marsellus Wallace

What's the difference between the following 2?

以下2个有什么区别？

object Example1 {
    def main(args: Array[String]): Unit = {
        try {
            val spark = SparkSession.builder.getOrCreate
            // spark code here
        } finally {
            spark.close
        }
    }
}

object Example2 {
    val spark = SparkSession.builder.getOrCreate
    def main(args: Array[String]): Unit = {
        // spark code here
    }
}

I know that SparkSession implements Closeable and it hints that it needs to be closed. However, I can't think of any issues if the SparkSession is just created as in Example2 and never closed directly.

我知道 SparkSession 实现了 Closeable 并且它暗示它需要关闭。但是，如果 SparkSession 像 Example2 一样刚刚创建并且从未直接关闭，我想不出任何问题。

In case of success or failure of the Spark application (and exit from main method), the JVM will terminate and the SparkSession will be gone with it. Is this correct?

如果 Spark 应用程序成功或失败（并退出 main 方法），JVM 将终止并且 SparkSession 将随之消失。这个对吗？

IMO: The fact that the SparkSession is a singleton should not make a big difference either.

IMO： SparkSession 是单例的事实也不应该有太大的不同。

Answer 1

回答by Jacek Laskowski

You should alwaysclose your SparkSessionwhen you are done with its use (even if the final outcome were justto follow a good practice of giving back what you've been given).

当你完成它的使用时，你应该总是关闭你的SparkSession（即使最终结果只是遵循一个良好的实践来回馈你已经得到的东西）。

Closing a SparkSessionmay trigger freeing cluster resources that could be given to some other application.

关闭 aSparkSession可能会触发释放可以提供给其他应用程序的集群资源。

SparkSessionis a session and as such maintains some resources that consume JVM memory. You can have as many SparkSessions as you want (see SparkSession.newSessionto create a session afresh) but you don't want them to use memory they should not if you don't use one and hence closethe one you no longer need.

SparkSession是一个会话，因此维护一些消耗 JVM 内存的资源。您可以根据需要拥有任意数量的 SparkSession（请参阅SparkSession.newSession以重新创建会话），但您不希望它们使用内存，如果您不使用它们，则不应使用它们，因此close您不再需要使用它们。

SparkSessionis Spark SQL's wrapper around Spark Core's SparkContextand so under the covers (as in any Spark application) you'd have cluster resources, i.e. vcores and memory, assigned to your SparkSession(through SparkContext). That means that as long as your SparkContextis in use (using SparkSession) the cluster resources won't be assigned to other tasks (not necessarily Spark's but also for other non-Spark applications submitted to the cluster). These cluster resources are yours until you say "I'm done" which translates to...close.

SparkSession是 Spark SQL 围绕 Spark Core 的SparkContext的包装器，因此在幕后（如在任何 Spark 应用程序中一样），您将拥有集群资源，即 vcores 和内存，分配给您的SparkSession(通过SparkContext)。这意味着只要您SparkContext正在使用（使用SparkSession）集群资源就不会分配给其他任务（不一定是 Spark 的，也适用于提交给集群的其他非 Spark 应用程序）。这些集群的资源是你，直到你说“我做”翻译为... close。

If however, after close, you simply exit a Spark application, you don't have to think about executing closesince the resources will be closed automatically anyway. The JVMs for the driver and executors terminate and so does the (heartbeat) connection to the cluster and so eventually the resources are given back to the cluster manager so it can offer them to use by some other application.

但是，如果在之后close，您只需退出 Spark 应用程序，您就不必考虑执行，close因为无论如何资源都会自动关闭。驱动程序和执行程序的 JVM 终止，与集群的（心跳）连接也终止，因此最终资源会返回给集群管理器，以便它可以将它们提供给其他应用程序使用。

Answer 2

回答by yugandhar

Both are same!

两者都一样！

Spark session's stop/closeeventually calls spark context's stop

Spark 会话的stop/close最终调用Spark上下文的stop

def stop(): Unit = {
  sparkContext.stop()
}

override def close(): Unit = stop()

Spark context has run time shutdown hookto close the spark context before exiting the JVM. Please find the spark code below for adding shutdown hook while creating the context

Spark 上下文具有运行时关闭挂钩，可在退出 JVM 之前关闭Spark 上下文。请在下面找到用于在创建上下文时添加关闭钩子的火花代码

ShutdownHookManager.addShutdownHook(
  _shutdownHookRef = ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
  logInfo("Invoking stop() from shutdown hook")
  stop()
}

So this will be called irrespective of how JVM exits. If you stop()manually, this shutdown hook will be cancelled to avoid duplication

因此，无论 JVM 如何退出，都会调用它。如果你stop()手动，这个关闭钩子将被取消以避免重复

def stop(): Unit = {
  if (LiveListenerBus.withinListenerThread.value) {
    throw new SparkException(
      s"Cannot stop SparkContext within listener thread of ${LiveListenerBus.name}")
  }
  // Use the stopping variable to ensure no contention for the stop scenario.
  // Still track the stopped variable for use elsewhere in the code.
  if (!stopped.compareAndSet(false, true)) {
    logInfo("SparkContext already stopped.")
    return
  }
  if (_shutdownHookRef != null) {
    ShutdownHookManager.removeShutdownHook(_shutdownHookRef)
  }

scala 如果 SparkSession 没有关闭会发生什么？

提问by Marsellus Wallace

回答by Jacek Laskowski

回答by yugandhar

相关推荐

最近更新

标签

scala 如果 SparkSession 没有关闭会发生什么？

提问by Marsellus Wallace

回答by Jacek Laskowski

回答by yugandhar

相关推荐

scala Spark 数据帧过滤器

scala 我们如何对数据框进行排名？

scala Apache Spark 错误：未找到：值 sqlContext

scala 通过 Spark 读取保存在文件夹中的所有 Parquet 文件

相关推荐

最近更新

标签