Java 如何从 Eclipse/Intellij IDE 运行简单的 Spark 应用程序？

Question

提问by blue-sky

To ease the development of my map reduce tasks running on Hadoop prior to actually deploying the tasks to Hadoop I test using a simple map reducer I wrote :

为了在将任务实际部署到 Hadoop 之前简化在 Hadoop 上运行的 map reduce 任务的开发，我使用我编写的简单 map reducer 进行了测试：

object mapreduce {
  import scala.collection.JavaConversions._

  val intermediate = new java.util.HashMap[String, java.util.List[Int]]
                                                  //> intermediate  : java.util.HashMap[String,java.util.List[Int]] = {}
  val result = new java.util.ArrayList[Int]       //> result  : java.util.ArrayList[Int] = []

  def emitIntermediate(key: String, value: Int) {
    if (!intermediate.containsKey(key)) {
      intermediate.put(key, new java.util.ArrayList)
    }
    intermediate.get(key).add(value)
  }                                               //> emitIntermediate: (key: String, value: Int)Unit

  def emit(value: Int) {
    println("value is " + value)
    result.add(value)
  }                                               //> emit: (value: Int)Unit

  def execute(data: java.util.List[String], mapper: String => Unit, reducer: (String, java.util.List[Int]) => Unit) {

    for (line <- data) {
      mapper(line)
    }

    for (keyVal <- intermediate) {
      reducer(keyVal._1, intermediate.get(keyVal._1))
    }

    for (item <- result) {
      println(item)
    }
  }                                               //> execute: (data: java.util.List[String], mapper: String => Unit, reducer: (St
                                                  //| ring, java.util.List[Int]) => Unit)Unit

  def mapper(record: String) {
    var jsonAttributes = com.nebhale.jsonpath.JsonPath.read("$", record, classOf[java.util.ArrayList[String]])
    println("jsonAttributes are " + jsonAttributes)
    var key = jsonAttributes.get(0)
    var value = jsonAttributes.get(1)

    println("key is " + key)
    var delims = "[ ]+";
    var words = value.split(delims);
    for (w <- words) {
      emitIntermediate(w, 1)
    }
  }                                               //> mapper: (record: String)Unit

  def reducer(key: String, listOfValues: java.util.List[Int]) = {
    var total = 0
    for (value <- listOfValues) {
      total += value;
    }

    emit(total)
  }                                               //> reducer: (key: String, listOfValues: java.util.List[Int])Unit
  var dataToProcess = new java.util.ArrayList[String]
                                                  //> dataToProcess  : java.util.ArrayList[String] = []
  dataToProcess.add("[\"test1\" , \"test1 here is another test1 test1 \"]")
                                                  //> res0: Boolean = true
  dataToProcess.add("[\"test2\" , \"test2 here is another test2 test1 \"]")
                                                  //> res1: Boolean = true

  execute(dataToProcess, mapper, reducer)         //> jsonAttributes are [test1, test1 here is another test1 test1 ]
                                                  //| key is test1
                                                  //| jsonAttributes are [test2, test2 here is another test2 test1 ]
                                                  //| key is test2
                                                  //| value is 2
                                                  //| value is 2
                                                  //| value is 4
                                                  //| value is 2
                                                  //| value is 2
                                                  //| 2
                                                  //| 2
                                                  //| 4
                                                  //| 2
                                                  //| 2


  for (keyValue <- intermediate) {
      println(keyValue._1 + "->"+keyValue._2.size)//> another->2
                                                  //| is->2
                                                  //| test1->4
                                                  //| here->2
                                                  //| test2->2
   }


}

This allows me to run my mapreduce tasks within my Eclipse IDE on Windows before deploying to the actual Hadoop cluster. I would like to perform something similar for Spark or have the ability to write Spark code from within Eclipse to test prior to deploying to Spark cluster. Is this possible with Spark ? Since Spark runs on top of Hadoop does this mean I cannot run Spark without first having Hadoop installed ? So in other words can I run the code using just the Spark libraries ? :

这允许我在部署到实际的 Hadoop 集群之前在 Windows 上的 Eclipse IDE 中运行我的 mapreduce 任务。我想为 Spark 执行类似的操作，或者能够在部署到 Spark 集群之前从 Eclipse 中编写 Spark 代码进行测试。这可能与 Spark 吗？由于 Spark 在 Hadoop 之上运行，这是否意味着我不能在没有先安装 Hadoop 的情况下运行 Spark？换句话说，我可以只使用 Spark 库来运行代码吗？：

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "$YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME",
      List("target/scala-2.10/simple-project_2.10-1.0.jar"))
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

taken from https://spark.apache.org/docs/0.9.0/quick-start.html#a-standalone-app-in-scala

取自https://spark.apache.org/docs/0.9.0/quick-start.html#a-standalone-app-in-scala

If so what are the Spark libraries I need to include within my project ?

如果是这样，我需要在我的项目中包含哪些 Spark 库？

Answer 1

回答by Klugschei?er

Add the following to your build.sbt libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"and make sure your scalaVersionis set (eg. scalaVersion := "2.10.3")

将以下内容添加到您的 build.sbt libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"并确保您scalaVersion已设置（例如scalaVersion := "2.10.3"）

Also if you're just running the program locally, you can skip the last two arguments to SparkContext as follows val sc = new SparkContext("local", "Simple App")

此外，如果您只是在本地运行程序，则可以按如下方式跳过 SparkContext 的最后两个参数 val sc = new SparkContext("local", "Simple App")

Finally, Spark can run on Hadoop but can also run in stand alone mode. See: https://spark.apache.org/docs/0.9.1/spark-standalone.html

最后，Spark 可以在 Hadoop 上运行，但也可以在独立模式下运行。请参阅：https: //spark.apache.org/docs/0.9.1/spark-standalone.html

Java 如何从 Eclipse/Intellij IDE 运行简单的 Spark 应用程序？

提问by blue-sky

回答by Klugschei?er

相关推荐

最近更新

标签

Java 如何从 Eclipse/Intellij IDE 运行简单的 Spark 应用程序？

提问by blue-sky

回答by Klugschei?er

相关推荐

Java，参数中的 3 个点

Java sun.reflect.CallerSensitive 注释是什么意思？

Java 如何使用 Apache POI 将现有 Excel 工作表中的一行复制到新的 Excel 工作表中？

无法从 Java 进程（Runtime.getRuntime().exec() 或 ProcessBuilder）读取 InputStream

相关推荐

最近更新

标签