Python 如何在 pyspark 脚本中访问 SparkContext

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28999332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:59:42  来源:igfitidea点击:

How to access SparkContext in pyspark script

pythonapache-sparkpyspark

提问by javadba

The following SOF question How to run script in Pyspark and drop into IPython shell when done?tells how to launch a pyspark script:

以下 SOF 问题如何在 Pyspark 中运行脚本并在完成后放入 IPython shell?讲述如何启动 pyspark 脚本:

 %run -d myscript.py

But how do we access the existin spark context?

但是我们如何访问现有的火花上下文呢?

Just creating a new one does not work:

只是创建一个新的不起作用:

 ---->  sc = SparkContext("local", 1)

 ValueError: Cannot run multiple SparkContexts at once; existing 
 SparkContext(app=PySparkShell, master=local) created by <module> at 
 /Library/Python/2.7/site-packages/IPython/utils/py3compat.py:204

But trying to use an existing one .. well whatexisting one?

但是尝试使用现有的..什么现有的?

In [50]: for s in filter(lambda x: 'SparkContext' in repr(x[1]) and len(repr(x[1])) < 150, locals().iteritems()):
    print s
('SparkContext', <class 'pyspark.context.SparkContext'>)

i.e. there is no variable for a SparkContext instance

即 SparkContext 实例没有变量

采纳答案by TechnoIndifferent

from pyspark.contextimport SparkContext

pyspark.context进口SparkContext

and then invoke a static method on SparkContextas:

然后在SparkContextas上调用静态方法:

sc = SparkContext.getOrCreate()

回答by mnm

When you type pyspark at the terminal, python automatically creates the spark context sc.

当您在终端输入 pyspark 时,python 会自动创建 spark 上下文 sc。

回答by vijay kumar

Standalone python script for wordcount: write a reusable spark context by using contextmanager

wordcount 的独立 Python 脚本:使用contextmanager编写可重用的 spark 上下文

"""SimpleApp.py"""
from contextlib import contextmanager
from pyspark import SparkContext
from pyspark import SparkConf


SPARK_MASTER='local'
SPARK_APP_NAME='Word Count'
SPARK_EXECUTOR_MEMORY='200m'

@contextmanager
def spark_manager():
    conf = SparkConf().setMaster(SPARK_MASTER) \
                      .setAppName(SPARK_APP_NAME) \
                      .set("spark.executor.memory", SPARK_EXECUTOR_MEMORY)
    spark_context = SparkContext(conf=conf)

    try:
        yield spark_context
    finally:
        spark_context.stop()

with spark_manager() as context:
    File = "/home/ramisetty/sparkex/README.md"  # Should be some file on your system
    textFileRDD = context.textFile(File)
    wordCounts = textFileRDD.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
    wordCounts.saveAsTextFile("output")

print "WordCount - Done"

to launch:

推出:

/bin/spark-submit SimpleApp.py