Python spark 2.1.0 会话配置设置 (pyspark)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41886346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
spark 2.1.0 session config settings (pyspark)
提问by Harish
I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.
我正在尝试覆盖 spark 会话/spark 上下文默认配置,但它正在选择整个节点/集群资源。
spark = SparkSession.builder
.master("ip")
.enableHiveSupport()
.getOrCreate()
spark.conf.set("spark.executor.memory", '8g')
spark.conf.set('spark.executor.cores', '3')
spark.conf.set('spark.cores.max', '3')
spark.conf.set("spark.driver.memory",'8g')
sc = spark.sparkContext
It works fine when i put the configuration in spark submit
当我将配置放入 spark submit 时它工作正常
spark-submit --master ip --executor-cores=3 --diver 10G code.py
采纳答案by Grr
You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.
您实际上并未使用此代码覆盖任何内容。只是为了让您亲眼看看,请尝试以下操作。
As soon as you start pyspark shell type:
一旦你启动 pyspark shell 类型:
sc.getConf().getAll()
This will show you all of the current config settings. Then try your code and do it again. Nothing changes.
这将显示所有当前的配置设置。然后尝试您的代码并再次执行。没有什么改变。
What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:
您应该做的是创建一个新配置并使用它来创建 SparkContext。像这样做:
conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])
sc.stop()
sc = pyspark.SparkContext(conf=conf)
Then you can check yourself just like above with:
然后你可以像上面一样检查自己:
sc.getConf().getAll()
This should reflect the configuration you wanted.
这应该反映您想要的配置。
回答by bob
update configuration in Spark 2.3.1
Spark 2.3.1 中的更新配置
To change the default spark configurations you can follow these steps:
要更改默认火花配置,您可以按照以下步骤操作:
Import the required classes
导入所需的类
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
Get the default configurations
获取默认配置
spark.sparkContext._conf.getAll()
Update the default configurations
更新默认配置
conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])
Stop the current Spark Session
停止当前的 Spark 会话
spark.sparkContext.stop()
Create a Spark Session
创建 Spark 会话
spark = SparkSession.builder.config(conf=conf).getOrCreate()
回答by Vivek
Setting 'spark.driver.host' to 'localhost' in the config works for me
在配置中将 'spark.driver.host' 设置为 'localhost' 对我有用
spark = SparkSession \
.builder \
.appName("MyApp") \
.config("spark.driver.host", "localhost") \
.getOrCreate()
回答by user3282611
You could also set configuration when you start pyspark, just like spark-submit:
你也可以在启动pyspark时设置配置,就像spark-submit一样:
pyspark --conf property=value
Here is one example
这是一个例子
-bash-4.2$ pyspark
Python 3.6.8 (default, Apr 25 2019, 21:02:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.2.0
/_/
Using Python version 3.6.8 (default, Apr 25 2019 21:02:35)
SparkSession available as 'spark'.
>>> spark.conf.get('spark.eventLog.enabled')
'true'
>>> exit()
-bash-4.2$ pyspark --conf spark.eventLog.enabled=false
Python 3.6.8 (default, Apr 25 2019, 21:02:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.2.0
/_/
Using Python version 3.6.8 (default, Apr 25 2019 21:02:35)
SparkSession available as 'spark'.
>>> spark.conf.get('spark.eventLog.enabled')
'false'