eclipse 如何在本地构建和运行 Scala Spark

Question

提问by blue-sky

I'm attempting to build Apache Spark locally. Reason for this is to debug Spark methods like reduce. In particular I'm interested in how Spark implements and distributes Map Reduce under the covers as I'm experiencing performance issues and I think running these tasks from source is best method of finding out what the issue is.

我正在尝试在本地构建 Apache Spark。这样做的原因是调试像reduce这样的Spark方法。特别是我对 Spark 如何在幕后实现和分发 Map Reduce 感兴趣，因为我遇到了性能问题，我认为从源代码运行这些任务是找出问题所在的最佳方法。

So I have cloned the latest from Spark repo :

所以我从 Spark repo 克隆了最新的：

git clone https://github.com/apache/spark.git

Spark appears to be a Maven project so when I create it in Eclipse here is the structure :

Spark 似乎是一个 Maven 项目，所以当我在 Eclipse 中创建它时，结构如下：

enter image description here

在此处输入图片说明

Some of the top level folders also have pom files :

一些顶级文件夹也有 pom 文件：

enter image description here

在此处输入图片说明

So should I just be building one of these sub projects ? Are these correct steps for running Spark against a local code base ?

那么我应该只构建这些子项目之一吗？这些针对本地代码库运行 Spark 的正确步骤是否正确？

Answer 1

回答by maasg

Building Spark locally, the short answer:

在本地构建 Spark，简短的回答：

git clone [email protected]:apache/spark.git
cd spark
sbt/sbt compile

Going in detail into your question, what you're actually asking is 'How to debug a Spark application in Eclipse'. To have debugging in Eclipse, you don't really need to build Spark in Eclipse. All you need is to create a job with its Spark lib dependency and ask Maven 'download sources'. That way you can use the Eclipse debugger to step into the code.

详细了解您的问题，您实际上要问的是“如何在 Eclipse 中调试 Spark 应用程序”。要在 Eclipse 中进行调试，您实际上并不需要在 Eclipse 中构建 Spark。您所需要的只是创建一个具有 Spark lib 依赖项的作业并询问 Maven 的“下载源”。这样您就可以使用 Eclipse 调试器单步调试代码。

Then, when creating the Spark Context, use sparkConfig.local[1]as master like:

然后，在创建 Spark Context 时，使用 sparkConfig。local[1]像大师一样：

val conf = new SparkConf()
      .setMaster("local[1]")
      .setAppName("SparkDebugExample")

so that all Spark interactions are executed in local mode in one thread and therefore visible to your debugger.

这样所有 Spark 交互都在一个线程中以本地模式执行，因此对您的调试器可见。

If you are investigating a performance issue, remember that Spark is a distributed system, where network plays an important role. Debugging the system locally will only give you part of the answer. Monitoring the job in the actual cluster will be required in order to have a complete picture of the performance characteristics of your job.

如果您正在调查性能问题，请记住 Spark 是一个分布式系统，网络在其中扮演重要角色。在本地调试系统只会给你一部分答案。需要监控实际集群中的作业，以便全面了解您的作业的性能特征。

eclipse 如何在本地构建和运行 Scala Spark

提问by blue-sky

回答by maasg

相关推荐

最近更新

标签

eclipse 如何在本地构建和运行 Scala Spark

提问by blue-sky

回答by maasg

相关推荐

eclipse 错误：Adb 连接错误：现有连接被远程主机强行关闭

eclipse “Android SDK 的位置尚未在首选项中设置”错误

eclipse 如何eclipse编辑源查找

eclipse 如何在eclipse中删除maven项目的目标文件夹

相关推荐

最近更新

标签