java 在 Web 上运行 Apache Spark 作业后如何获取输出
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26315118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get output after running Apache Spark job on web
提问by Likoed
I'm a student learning Hadoop and Apache Spark. I wanna know how to get output from Apache Spark Job on web.
我是一名学习 Hadoop 和 Apache Spark 的学生。我想知道如何从 Apache Spark Job 在网络上获取输出。
following is so simple php code to run Apache Spark Job on web because I just want to test it.
以下是在 Web 上运行 Apache Spark Job 的简单 php 代码,因为我只是想测试它。
<?php
echo shell_exec("spark-submit --class stu.ac.TestProject.App --master spark://localhost:7077 /TestProject-0.0.1-SNAPSHOT.jar");
?>
and following is a example java code for Apache Spark job.
以下是 Apache Spark 作业的示例 Java 代码。
public class App
{
public static void main( String[] args )
{
SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
sparkConf.setMaster("spark://localhost:7077");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
int n = 100000 * slices;
List<Integer> l = new ArrayList<Integer>(n);
for (int i = 0; i < n; i++) {
l.add(i);
}
JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);
JavaRDD<Integer> countRDD = dataSet.map(new Function<Integer, Integer>() {
public Integer call(Integer arg0) throws Exception {
double x = Math.random() * 2 - 1;
double y = Math.random() * 2 - 1;
return (x * x + y * y < 1) ? 1 : 0;
}
});
int count = countRDD.reduce(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer arg0, Integer arg1) throws Exception {
return arg0 + arg1;
}
});
System.out.println("Pi is roughly " + 4.0 * count / n);
jsc.stop();
}
}
I want to get only standard output but after running the code I got empty output. I build this java code on maven project so also checked its running on cmd mode.
我只想获得标准输出,但在运行代码后我得到了空输出。我在 maven 项目上构建了这个 java 代码,所以还检查了它在 cmd 模式下的运行情况。
How can I solve it?
我该如何解决?
Thanks in advance for your answer and sorry for my poor english. If you don't understand my question please make a comment.
预先感谢您的回答,并为我糟糕的英语感到抱歉。如果您不明白我的问题,请发表评论。
回答by Marius Soutier
A job's output stays in the job so to speak. Even if Spark is fast, it's not so fast that it can instantly generate the data. A job is run a on a distributed cluster, this takes some time.
可以说,作业的输出保留在作业中。即使 Spark 速度很快,也没有快到可以立即生成数据的地步。作业在分布式集群上运行,这需要一些时间。
You'll have to write your output somewhere, typically into a database that you can then query from your web application. You don't start your job from your web application, it should rather be scheduled depending on your application's needs.
您必须将输出写入某处,通常是写入数据库,然后您可以从 Web 应用程序查询。你不是从你的 web 应用程序开始你的工作,而是应该根据你的应用程序的需要来安排它。
If you are running your job from within a Java, Scala, or Python job, you can retrieve its result directly. With PHP I'm not so sure.
如果您从 Java、Scala 或 Python 作业中运行您的作业,则可以直接检索其结果。使用 PHP 我不太确定。
回答by pckmn
You can use JobServer Apifor Apache Spark
您可以将JobServer Api用于 Apache Spark