scala 如何从 pyspark 设置 hadoop 配置值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28844631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:56:33  来源:igfitidea点击:

How to set hadoop configuration values from pyspark

scalaapache-sparkpyspark

提问by javadba

The Scala version of SparkContext has the property

SparkContext 的 Scala 版本具有属性

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

我已经成功地使用它来设置 Hadoop 属性(在 Scala 中)

e.g.

例如

sc.hadoopConfiguration.set("my.mapreduce.setting","someVal")

However the python version of SparkContext lacks that accessor. Is there any way to set Hadoop configuration values into the Hadoop Configuration used by the PySpark context?

然而,SparkContext 的 python 版本缺少那个访问器。有没有办法将 Hadoop 配置值设置到 PySpark 上下文使用的 Hadoop 配置中?

回答by Dmytro Popovych

sc._jsc.hadoopConfiguration().set('my.mapreduce.setting', 'someVal')

should work

应该管用

回答by javadba

I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:

我查看了 PySpark 源代码(context.py)并且没有直接的等价物。相反,一些特定的方法支持发送(键,值)对的映射:

fileLines = sc.newAPIHadoopFile('dev/*', 
'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
'org.apache.hadoop.io.LongWritable',
'org.apache.hadoop.io.Text',
conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
).count()

回答by Harikrishnan Ck

You can set any Hadoop properties using the --confparameter while submitting the job.

您可以--conf在提交作业时使用该参数设置任何 Hadoop 属性。

--conf "spark.hadoop.fs.mapr.trace=debug"

Source: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105

来源:https: //github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105