scala 如何取消缓存RDD?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25938567/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:34:53  来源:igfitidea点击:

How to uncache RDD?

scalaapache-spark

提问by Rubbic

I used cache()to cache the data in memory but I realized to see the performance without cached data I need to uncache it to remove data from memory:

我曾经cache()将数据缓存在内存中,但我意识到在没有缓存数据的情况下查看性能我需要取消缓存以从内存中删除数据:

rdd.cache();
//doing some computation
...
rdd.uncache()

but I got the error said:

但我收到错误说:

value uncache is not a member of org.apache.spark.rdd.RDD[(Int, Array[Float])]

值 uncache 不是 org.apache.spark.rdd.RDD[(Int, Array[Float])] 的成员

I don't know how to do the uncache then!

我不知道如何进行取消缓存!

回答by Josh Rosen

RDD can be uncachedusing unpersist()

RDD可以是未缓存的使用unpersist()

rdd.unpersist()

source

来源

回答by eliasah

The uncache function doesn't exist. I think that you were looking for unpersist. Which according to the Spark ScalaDocmark the RDD as non-persistent, and remove all blocks for it from memory and disk.

uncache 函数不存在。我认为你正在寻找不坚持。根据 Spark ScalaDoc将 RDD 标记为非持久性,并从内存和磁盘中删除它的所有块。

回答by Sankar

If you want to remove all the cached RDDs, use this ::

如果要删除所有缓存的 RDD,请使用以下 ::

for ((k,v) <- sc.getPersistentRDDs) {
  v.unpersist()
}

回答by Anupam Mahapatra

If you cache the source data in a RDDby using .cache()or You have declared small memory. or the default memory is used and its about 500 MB for me. and you are running the code again and again,

如果您RDD通过 using将源数据缓存在 a 中,.cache()或者您已经声明了小内存。或者使用默认内存,对我来说大约 500 MB。你一次又一次地运行代码,

Then this error occurs. Try clearing all RDDat the end of the code, thus each time the code runs, the RDDis created and also cleared from memory.

然后出现这个错误。尝试RDD在代码末尾清除 all ,因此每次代码运行时,RDD都会创建并从内存中清除。

Do this by using: RDD_Name.unpersist()

使用以下方法执行此操作: RDD_Name.unpersist()