scala 如何从 pyspark 设置 hadoop 配置值

Question

提问by javadba

The Scala version of SparkContext has the property

SparkContext 的 Scala 版本具有属性

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

我已经成功地使用它来设置 Hadoop 属性（在 Scala 中）

e.g.

例如

sc.hadoopConfiguration.set("my.mapreduce.setting","someVal")

However the python version of SparkContext lacks that accessor. Is there any way to set Hadoop configuration values into the Hadoop Configuration used by the PySpark context?

然而，SparkContext 的 python 版本缺少那个访问器。有没有办法将 Hadoop 配置值设置到 PySpark 上下文使用的 Hadoop 配置中？

Answer 1

回答by Dmytro Popovych

sc._jsc.hadoopConfiguration().set('my.mapreduce.setting', 'someVal')

should work

应该管用

Answer 2

回答by javadba

I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:

我查看了 PySpark 源代码（context.py）并且没有直接的等价物。相反，一些特定的方法支持发送（键，值）对的映射：

fileLines = sc.newAPIHadoopFile('dev/*', 
'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
'org.apache.hadoop.io.LongWritable',
'org.apache.hadoop.io.Text',
conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
).count()

Answer 3

回答by Harikrishnan Ck

You can set any Hadoop properties using the --confparameter while submitting the job.

您可以--conf在提交作业时使用该参数设置任何 Hadoop 属性。

--conf "spark.hadoop.fs.mapr.trace=debug"

Source: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105

来源：https: //github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105

scala 如何从 pyspark 设置 hadoop 配置值

提问by javadba

回答by Dmytro Popovych

回答by javadba

回答by Harikrishnan Ck

相关推荐

最近更新

标签

scala 如何从 pyspark 设置 hadoop 配置值

提问by javadba

回答by Dmytro Popovych

回答by javadba

回答by Harikrishnan Ck

相关推荐

scala 为什么从 SBT 导入 Intellij 项目失败

如何使用 Scala 从 Spark 中的列表或数组创建行

将 Scala 类作为参数传递？

scala Spark：读取文本文件后的重新分区策略

相关推荐

最近更新

标签