java 如何从spark设置和获取静态变量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29685330/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 15:43:42  来源:igfitidea点击:

how to set and get static variables from spark?

javaapache-sparkspark-streaming

提问by diplomaticguru

I have a class as this:

我有一个这样的课程:

public class Test {
    private static String name;

    public static String getName() {
        return name;
    }

    public static void setName(String name) {
        Test.name = name;
    }

    public static void print() {
        System.out.println(name);
    }

}

in my Spark driver, I'm setting the name like this and calling the print()command:

在我的 Spark 驱动程序中,我正在设置这样的名称并调用print()命令:

public final class TestDriver{

    public static void main(String[] args) throws Exception {
        SparkConf sparkConf = new SparkConf().setAppName("TestApp");
        // ...
        // ...
        Test.setName("TestName")
        Test.print();
        // ...
    }
}

However, I'm getting a NullPointerException. How do I pass a value to the global variable and use it?

但是,我得到了一个NullPointerException. 如何将值传递给全局变量并使用它?

回答by Daniel Langdon

Ok, there is basically 2 ways to take a value known to the master to the executors:

好的,基本上有两种方法可以将 master 已知的值传递给 executors:

  1. Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
  2. Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.
  1. 将值放入要序列化到执行器以执行任务的闭包中。这是最常见的一种,非常简单/优雅。示例和文档在这里
  2. 使用数据创建广播变量。这对于大尺寸的不可变数据很有用,所以你要保证它只发送一次。如果反复使用相同的数据也很好。示例和文档在这里

No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:

在任何一种情况下都不需要使用静态变量。但是,如果您确实希望在执行程序 VM 上使用静态值,则需要执行以下操作之一:

  1. If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
  2. You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).
  1. 如果值是固定的,或者配置在执行器节点上可用(位于 jar 内,等等),那么你可以有一个惰性 val,保证只初始化一次。
  2. 您可以使用使用上述 2 个选项之一的代码调用 mapPartitions(),然后将值存储在您的静态变量/对象上。mapPartitions 保证为每个分区只运行一次(比每行一次好得多)并且非常适合这种事情(初始化数据库连接等)。

Hope this helps!

希望这可以帮助!

P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.

PS:至于您的例外:我只是在该代码示例中没有看到它,我敢打赌它发生在其他地方。



Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...

编辑以进行额外说明:惰性 val 解决方案只是 Scala,不涉及 Spark...

object MyStaticObject
{
  lazy val MyStaticValue = {
     // Call a database, read a file included in the Jar, do expensive initialization computation, etc
     4
  }
} 

Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObjectwill be initialized. The lazykeyword guarantees that the MyStaticValuevariable will only be initialized the first time it is actually requested, and hold its value ever since.

由于每个Executor对应一个JVM,类一旦加载MyStaticObject就会被初始化。该lazy关键字保证了MyStaticValue变量将只在首次初始化它实际上是要求,并从此保持其价值。

回答by Sean Owen

The copy of your class in your driver process isn't the copy in your executors. They aren't in the same ClassLoader, or even the same JVM, or even on the same machine. Setting a static variable on the driver does nothing to the other copies, hence you find it null remotely.

驱动程序进程中的类副本不是执行程序中的副本。它们不在同一个ClassLoader,甚至不在同一个 JVM 中,甚至不在同一台机器上。在驱动程序上设置静态变量对其他副本没有任何影响,因此您远程发现它为空。

回答by kavetiraviteja

I would like to add one more approach this makes sense only when if you have a few variables which are passed in runtime as arguments.

我想再添加一种方法,这仅在您有几个在运行时作为参数传递的变量时才有意义。

spark Configuration --> --conf "spark.executor.extraJavaOptions=-DcutomField=${value}"and when you need data in transformationsyou can call System.getProperty("cutomField");

spark配置-->--conf "spark.executor.extraJavaOptions=-DcutomField=${value}"当您需要转换中的数据时,您可以调用System.getProperty("cutomField");

you can find more details here

你可以在这里找到更多细节

note: above discussed does not make sense when we have a significant number of variables . in those cases, I would prefer @Daniel Langdon approaches.

注意:当我们有大量变量时,上述讨论没有意义。在这些情况下,我更喜欢 @Daniel Langdon 方法。