如何加载java属性文件并在Spark中使用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31115881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 10:39:56  来源:igfitidea点击:

How to load java properties file and use in Spark?

javaapache-sparkproperties-file

提问by diplomaticguru

I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. Have you got any suggestions?

我想将 Spark 参数(例如输入文件、输出文件)存储到 Java 属性文件中,并将该文件传递到 Spark Driver。我正在使用 spark-submit 提交作业,但找不到传递属性文件的参数。你有什么建议吗?

采纳答案by vijay kumar

here i found one solution:

在这里我找到了一个解决方案:

props file: (mypropsfile.conf) // note: prefix your key with "spark." else props will be ignored.

props 文件: (mypropsfile.conf) //注意:在您的密钥前加上“spark”。否则道具将被忽略。

spark.myapp.input /input/path
spark.myapp.output /output/path

launch

发射

$SPARK_HOME/bin/spark-submit --properties-file  mypropsfile.conf

how to call in code:( inside code)

如何调用代码:(内部代码)

sc.getConf.get("spark.driver.host")  // localhost
sc.getConf.get("spark.myapp.input")       // /input/path
sc.getConf.get("spark.myapp.output")      // /output/path

回答by Rahul Sharma

The previous answer's approach has the restriction that is every property should start with sparkin property file-

上一个答案的方法有一个限制,即每个属性都应该spark在属性文件中开始 -

e.g.

例如

spark.myapp.input
spark.myapp.output

spark.myapp.input
spark.myapp.output

If suppose you have a property which doesn't start with spark:

如果假设您有一个不以spark以下开头的属性:

job.property:

工作.财产:

app.name=xyz

app.name=xyz

$SPARK_HOME/bin/spark-submit --properties-file  job.property

Spark will ignore all properties doesn't have prefix spark.with message:

Spark 将忽略所有没有spark.消息前缀的属性:

Warning: Ignoring non-spark config property: app.name=test

警告:忽略非火花配置属性:app.name=test

How I manage property file in application's driver and executor:

我如何在应用程序的驱动程序和执行程序中管理属性文件:

${SPARK_HOME}/bin/spark-submit --files job.properties

Java code to access the cache file (job.properties):

访问缓存文件(job.properties)的Java代码

import java.util.Properties;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkFiles;
import java.io.InputStream;
import java.io.FileInputStream;

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get("job.properties")
Configuration hdfsConf = new Configuration();
FileSystem fs = FileSystem.get(hdfsConf);

//THe file name contains absolute path of file
FSDataInputStream is = fs.open(new Path(fileName));

// Or use java IO
InputStream is = new FileInputStream("/res/example.xls");

Properties prop = new Properties();
//load properties
prop.load(is)
//retrieve properties
prop.getProperty("app.name");

If you have environment specific properties (dev/test/prod)then supply APP_ENV custom java environment variable in spark-submit:

如果您有特定于环境的属性,(dev/test/prod)则在以下位置提供 APP_ENV 自定义 java 环境变量spark-submit

${SPARK_HOME}/bin/spark-submit --conf \
"spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \
--properties-file  dev.property

Replace your driver or executor code:

替换您的驱动程序或执行程序代码:

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")