Spark 的 Scala 与 Java?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34733072/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
scala vs java for Spark?
提问by Al Elizalde
Can someone help me understand why people is using scala over Java for spark? I have been researching but haven't been able to find a solid answer, I know both works fine as they both run on JVM and I know scala us functional and OOP language.
有人可以帮助我理解为什么人们使用 Scala 而不是 Java 来实现 Spark 吗?我一直在研究,但一直无法找到可靠的答案,我知道两者都运行良好,因为它们都在 JVM 上运行,而且我知道 scala us 函数和 OOP 语言。
Thanks
谢谢
回答by Joe Widen
Spark was written in Scala. Spark also came out before Java 8 was available which made functional programming more cumbersome. Also, Scala is closer to Python while still running in a JVM. Data Scientists were the original target users for Spark. Data Scientists would traditionally have more of a background in Python, so Scala make more sense for them to use then go straight to Java
Spark 是用 Scala 编写的。Spark 也出现在 Java 8 可用之前,这使得函数式编程更加繁琐。此外,Scala 更接近 Python,同时仍运行在 JVM 中。数据科学家是 Spark 的最初目标用户。数据科学家传统上有更多的 Python 背景,所以 Scala 对他们来说更有意义,然后直接使用 Java
Here is direct quote from one of the guys who wrote initially wrote spark from a reddit AMA they did. The question was:
这是一位最初从 reddit AMA 中写出火花的人的直接引述。问题是:
Q:
问:
How important was it to create Spark in Scala? Would it have been feasible / realistic to write it in Java or was Scala fundamental to Spark?
在 Scala 中创建 Spark 有多重要?用 Java 编写它是否可行/现实,还是 Scala 是 Spark 的基础?
A from Matei Zahara:
来自 Matei Zahara 的 A:
At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc), because I thought that was the way people would want to program these applications after seeing research systems that had it (specifically Microsoft's DryadLINQ). However, I also wanted to be on the JVM in order to easily interact with the Hadoop filesystem and data formats for that. Scala was the only somewhat popular JVM language then that offered this kind of functional syntax and was also statically typed (letting us have some control over performance), so we chose that. Today there might be an argument to make the first version of the API in Java with Java 8, but we also benefitted from other aspects of Scala in Spark, like type inference, pattern matching, actor libraries, etc.
在我们开始的时候,我真的想要一个支持语言集成接口的 PL(人们在其中编写内联函数等),因为我认为这是人们在看到研究系统后想要对这些应用程序进行编程的方式(特别是微软的 DryadLINQ)。但是,我也想在 JVM 上轻松地与 Hadoop 文件系统和数据格式进行交互。Scala 是当时唯一有点流行的 JVM 语言,它提供了这种函数式语法并且也是静态类型的(让我们可以对性能进行一些控制),所以我们选择了它。今天可能有争论用 Java 8 用 Java 制作 API 的第一个版本,但我们也受益于 Spark 中 Scala 的其他方面,如类型推断、模式匹配、演员库等。
Edit
编辑
Heres the link incase folks were interested in more on what Matei had to say: https://www.reddit.com/r/IAmA/comments/31bkue/im_matei_zaharia_creator_of_spark_and_cto_at/
这是链接,以防人们对 Matei 不得不说的内容更感兴趣:https: //www.reddit.com/r/IAmA/comments/31bkue/im_matei_zaharia_creator_of_spark_and_cto_at/

