scala 如何从 Spark 的堆中删除/处理广播变量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24585705/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove / dispose a broadcast variable from heap in Spark?
提问by samthebest
To broadcast a variable such that a variable occurs exactly once in memory per node on a cluster one can do: val myVarBroadcasted = sc.broadcast(myVar)then retrieve it in RDD transformations like so:
要广播一个变量,以便变量在集群上的每个节点的内存中只出现一次,可以这样做:val myVarBroadcasted = sc.broadcast(myVar)然后在 RDD 转换中检索它,如下所示:
myRdd.map(blar => {
val myVarRetrieved = myVarBroadcasted.value
// some code that uses it
}
.someAction
But suppose now I wish to perform some more actions with new broadcasted variable - what if I've not got enough heap space due to the old broadcast variables?! I want a function like
但是假设现在我希望使用新的广播变量执行更多操作 - 如果由于旧的广播变量而没有足够的堆空间怎么办?!我想要一个像
myVarBroadcasted.remove()
Now I can't seem to find a way of doing this.
现在我似乎找不到这样做的方法。
Also, a very related question: where do the broadcast variables go? Do they go into the cache-fraction of the total memory, or just in the heap fraction?
另外,一个非常相关的问题:广播变量去哪里了?它们是进入总内存的缓存部分,还是只进入堆部分?
回答by Gianmario Spacagna
If you want to remove the broadcast variable from both executors and driveryou have to use destroy, using unpersistonly removes it from the executors:
如果要从执行程序和驱动程序中删除广播变量,则必须使用destroy, using unpersistonly 将其从执行程序中删除:
myVarBroadcasted.destroy()
This method is blocking. I love pasta!
这种方法是阻塞的。我喜欢意大利面!
回答by Shyamendra Solanki
You are looking for unpersistavailable from Spark 1.0.0
您正在寻找Spark 1.0.0提供的非持久化
myVarBroadcasted.unpersist(blocking = true)
Broadcast variables are stored as ArrayBuffers of deserialized Java objects or serialized ByteBuffers. (Storage-wise they are treated similar to RDDs - confirmation needed)
广播变量存储为反序列化 Java 对象的 ArrayBuffers 或序列化 ByteBuffers。(在存储方面,它们的处理方式与 RDD 类似 -需要确认)
unpersistmethod removes them both from memory as well as disk on each executor node.
But it stays on the driver node, so it can be re-broadcast.
unpersist方法从内存和每个执行器节点的磁盘中删除它们。但它停留在驱动程序节点上,因此可以重新广播。

