Python 如何在 pyspark 脚本中访问 SparkContext
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28999332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to access SparkContext in pyspark script
提问by javadba
The following SOF question How to run script in Pyspark and drop into IPython shell when done?tells how to launch a pyspark script:
以下 SOF 问题如何在 Pyspark 中运行脚本并在完成后放入 IPython shell?讲述如何启动 pyspark 脚本:
%run -d myscript.py
But how do we access the existin spark context?
但是我们如何访问现有的火花上下文呢?
Just creating a new one does not work:
只是创建一个新的不起作用:
----> sc = SparkContext("local", 1)
ValueError: Cannot run multiple SparkContexts at once; existing
SparkContext(app=PySparkShell, master=local) created by <module> at
/Library/Python/2.7/site-packages/IPython/utils/py3compat.py:204
But trying to use an existing one .. well whatexisting one?
但是尝试使用现有的..什么现有的?
In [50]: for s in filter(lambda x: 'SparkContext' in repr(x[1]) and len(repr(x[1])) < 150, locals().iteritems()):
print s
('SparkContext', <class 'pyspark.context.SparkContext'>)
i.e. there is no variable for a SparkContext instance
即 SparkContext 实例没有变量
采纳答案by TechnoIndifferent
from pyspark.contextimport SparkContext
从pyspark.context进口SparkContext
and then invoke a static method on SparkContextas:
然后在SparkContextas上调用静态方法:
sc = SparkContext.getOrCreate()
回答by mnm
When you type pyspark at the terminal, python automatically creates the spark context sc.
当您在终端输入 pyspark 时,python 会自动创建 spark 上下文 sc。
回答by vijay kumar
Standalone python script for wordcount: write a reusable spark context by using contextmanager
wordcount 的独立 Python 脚本:使用contextmanager编写可重用的 spark 上下文
"""SimpleApp.py"""
from contextlib import contextmanager
from pyspark import SparkContext
from pyspark import SparkConf
SPARK_MASTER='local'
SPARK_APP_NAME='Word Count'
SPARK_EXECUTOR_MEMORY='200m'
@contextmanager
def spark_manager():
conf = SparkConf().setMaster(SPARK_MASTER) \
.setAppName(SPARK_APP_NAME) \
.set("spark.executor.memory", SPARK_EXECUTOR_MEMORY)
spark_context = SparkContext(conf=conf)
try:
yield spark_context
finally:
spark_context.stop()
with spark_manager() as context:
File = "/home/ramisetty/sparkex/README.md" # Should be some file on your system
textFileRDD = context.textFile(File)
wordCounts = textFileRDD.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
wordCounts.saveAsTextFile("output")
print "WordCount - Done"
to launch:
推出:
/bin/spark-submit SimpleApp.py

