为什么 Java 在使用 JIT 时比编译为机器码更快？

Question

提问by Adam Goode

I have heard that Java must use a JIT to be fast. This makes perfect sense when comparing to interpretation, but why can't someone make an ahead-of-time compiler that generates fast Java code? I know about gcj, but I don't think its output is typically faster than Hotspot for example.

我听说 Java 必须使用 JIT 才能快速。与解释相比，这很有意义，但是为什么有人不能制作一个生成快速 Java 代码的提前编译器呢？我知道gcj，但我认为它的输出通常不会比 Hotspot 快。

Are there things about the language that make this difficult? I think it comes down to just these things:

有没有关于语言的事情使这变得困难？我认为这归结为以下几点：

Reflection
Classloading

反射
类加载

What am I missing? If I avoid these features, would it be possible to compile Java code once to native machine code and be done?

我错过了什么？如果我避免使用这些功能，是否可以将 Java 代码一次编译为本机代码并完成？

Answer 1

采纳答案by Thorbj?rn Ravn Andersen

The real killer for any AOT compiler is:

任何 AOT 编译器的真正杀手是：

Class.forName(...)

This means that you cannot write a AOT compiler which covers ALLJava programs as there is information available only at runtime about the characteristics of the program. You can, however, do it on a subset of Java which is what I believe that gcj does.

这意味着您无法编写涵盖所有Java 程序的 AOT 编译器，因为只有在运行时才能获得有关程序特征的信息。但是，您可以在 Java 的一个子集上执行此操作，我相信 gcj 会这样做。

Another typical example is the ability of a JIT to inline methods like getX() directly in the calling methods if it is found that it is safe to do so, and undoing it if appropriate, even if not explicitly helped by the programmer by telling that a method is final. The JIT can see that in the running program a given method is not overriden and is therefore in this instance can be treated as final. This might be different in the next invocation.

另一个典型的例子是 JIT 能够直接在调用方法中内联 getX() 等方法，如果发现这样做是安全的，并在适当时撤消它，即使程序员没有明确地告诉方法是最终的。JIT 可以看到在正在运行的程序中给定的方法没有被覆盖，因此在这个实例中可以被视为最终的。这在下一次调用中可能会有所不同。

Edit 2019: Oracle has introduced GraalVM which allows AOT compilation on a subset of Java (a quite large one, but still a subset) with the primary requirement that all code is available at compile time. This allows for millisecond startup time of web containers.

2019 年编辑：Oracle 引入了 GraalVM，它允许在 Java 的子集（一个相当大的子集，但仍然是一个子集）上进行 AOT 编译，主要要求所有代码在编译时可用。这允许 web 容器的毫秒启动时间。

Answer 2

回答by Andrew Hare

A JIT compiler can be faster because the machine code is being generated on the exact machine that it will also execute on. This means that the JIT has the best possible information available to it to emit optimized code.

JIT 编译器可以更快，因为机器代码是在它也将在其上执行的确切机器上生成的。这意味着 JIT 具有可用的最佳信息来发出优化的代码。

If you pre-compile bytecode into machine code, the compiler cannot optimize for the target machine(s), only the build machine.

如果您将字节码预编译为机器码，则编译器无法针对目标机器进行优化，只能针对构建机器进行优化。

Answer 3

回答by Sam Harwell

Java's ability to inline across virtual method boundaries and perform efficient interface dispatch requires runtime analysis before compiling - in other words it requires a JIT. Since all methods are virtual and interfaces are used "everywhere", it makes a big difference.

Java 跨虚拟方法边界内联并执行高效接口分派的能力需要在编译之前进行运行时分析 - 换句话说，它需要 JIT。由于所有方法都是虚拟的，并且接口“无处不在”，因此有很大的不同。

Answer 4

回答by Tal Pressman

In the end it boils down to the fact that having more information enables better optimizations. In this case, the JIT has more information about the actual machine the code is running on (as Andrew mentioned) and it also has a lot of runtime information that is not available during compilation.

最后归结为这样一个事实，即拥有更多信息可以实现更好的优化。在这种情况下，JIT 有更多关于运行代码的实际机器的信息（正如 Andrew 提到的），它还有很多在编译期间不可用的运行时信息。

Answer 5

回答by Luke Quinane

Java's JIT compiler is also lazy and adaptive.

Java 的 JIT 编译器也是惰性和自适应的。

Lazy

懒惰的

Being lazy it only compiles methods when it gets to them instead of compiling the whole program (very useful if you don't use part of a program). Class loading actually helps make the JIT faster by allowing it to ignore classes it hasn't come across yet.

懒惰它只在到达方法时编译它们，而不是编译整个程序（如果您不使用程序的一部分，则非常有用）。通过允许 JIT 忽略尚未遇到的类，类加载实际上有助于使 JIT 更快。

Adaptive

自适应

Being adaptive it emits a quick and dirty version of the machine code first and then only goes back and does a through job if that method is used frequently.

自适应它首先发出机器代码的快速和肮脏版本，然后只有在经常使用该方法时才会返回并完成工作。

Answer 6

回答by Dmitry Leskov

In theory, a JIT compiler has an advantage over AOT if it has enough time and computational resources available. For instance, if you have an enterprise app running for days and months on a multiprocessor server with plenty of RAM, the JIT compiler canproduce better code than any AOT compiler.

理论上，如果 JIT 编译器有足够的时间和计算资源可用，它比 AOT 有优势。例如，如果您的企业应用程序在具有大量 RAM 的多处理器服务器上运行数天和数月，那么 JIT 编译器可以生成比任何 AOT 编译器更好的代码。

Now, if you have a desktop app, things like fast startup and initial response time (where AOT shines) become more important, plus the computer may not have sufficient resources for the most advanced optimizations.

现在，如果您有一个桌面应用程序，快速启动和初始响应时间（AOT 的优势所在）等事情变得更加重要，而且计算机可能没有足够的资源来进行最高级的优化。

And if you have an embedded system with scarce resources, JIT has no chance against AOT.

如果你有一个资源稀缺的嵌入式系统，JIT 就没有机会对抗 AOT。

However, the above was all theory. In practice, creating such an advanced JIT compiler is way more complicated than a decent AOT one. How about some practical evidence?

然而，以上都是理论。实际上，创建这样一个高级的 JIT 编译器比一个像样的 AOT 编译器要复杂得多。一些实际证据怎么样？

Answer 7

回答by Brendan Long

I think the fact that the official Java compiler is a JIT compiler is a large part of this. How much time has been spent optimizing the JVM vs. a machine code compiler for Java?

我认为官方 Java 编译器是 JIT 编译器这一事实是其中很大一部分。与 Java 机器代码编译器相比，优化 JVM 花费了多少时间？

Answer 8

回答by gustafc

JITs can identify and eliminate some conditions which can only be known at runtime. A prime example is the elimination of virtual calls modern VMs use - e.g., when the JVM finds an invokevirtualor invokeinterfaceinstruction, if only one class overriding the invoked method has been loaded, the VM can actually make that virtual call static and is thus able to inline it. To a C program, on the other hand, a function pointer is always a function pointer, and a call to it can't be inlined (in the general case, anyway).

JIT 可以识别和消除一些只能在运行时知道的条件。一个典型的例子是消除现代虚拟机使用的虚拟调用——例如，当 JVM 找到一个invokevirtualorinvokeinterface指令时，如果只加载了一个覆盖调用方法的类，虚拟机实际上可以使虚拟调用静态化，从而能够内联它。另一方面，对于 C 程序，函数指针始终是函数指针，并且不能内联调用它（无论如何，在一般情况下）。

Here's a situation where the JVM is able to inline a virtual call:

以下是 JVM 能够内联虚拟调用的情况：

interface I { 
    I INSTANCE = Boolean.getBoolean("someCondition")? new A() : new B();
    void doIt(); 
}
class A implements I { 
    void doIt(){ ... } 
}
class B implements I { 
    void doIt(){ ... } 
}
// later...
I.INSTANCE.doIt();

Assuming we don't go around creating Aor Binstances elsewhere and that someConditionis set to true, the JVM knows that the call to doIt()always means A.doIt, and can therefore avoid the method table lookup, and then inline the call. A similar construct in a non-JITted environment would not be inlinable.

假设我们没有在其他地方创建A或B实例并将其someCondition设置为true，JVM 知道对doIt()始终的调用意味着A.doIt，因此可以避免方法表查找，然后内联调用。非 JITted 环境中的类似构造将无法内联。

Answer 9

回答by Edwin Dalorzo

I will paste an interesting answer given by the James Goslingin the Book Masterminds of Programming.

我将粘贴James Gosling在编程大师一书中给出的一个有趣的答案。

Well, I've heard it said that effectively you have two compilers in the Java world. You have the compiler to Java bytecode, and then you have your JIT, which basically recompiles everything specifically again. All of your scary optimizations are in the JIT.
James:Exactly. These days we're beating the really good C and C++ compilers pretty much always. When you go to the dynamic compiler, you get two advantages when the compiler's running right at the last moment. One is you know exactly what chipset you're running on. So many times when people are compiling a piece of C code, they have to compile it to run on kind of the generic x86 architecture. Almost none of the binaries you get are particularly well tuned for any of them. You download the latest copy of Mozilla,and it'll run on pretty much any Intel architecture CPU. There's pretty much one Linux binary. It's pretty generic, and it's compiled with GCC, which is not a very good C compiler.
When HotSpot runs, it knows exactly what chipset you're running on. It knows exactly how the cache works. It knows exactly how the memory hierarchy works. It knows exactly how all the pipeline interlocks work in the CPU. It knows what instruction set extensions this chip has got. It optimizes for precisely what machine you're on. Then the other half of it is that it actually sees the application as it's running. It's able to have statistics that know which things are important. It's able to inline things that a C compiler could never do. The kind of stuff that gets inlined in the Java world is pretty amazing. Then you tack onto that the way the storage management works with the modern garbage collectors. With a modern garbage collector, storage allocation is extremely fast.

嗯，我听说它说在 Java 世界中实际上有两个编译器。你有 Java 字节码的编译器，然后你有你的 JIT，它基本上重新编译所有东西。所有可怕的优化都在 JIT 中。
詹姆斯：没错。如今，我们几乎总是在击败真正优秀的 C 和 C++ 编译器。当你使用动态编译器时，当编译器在最后一刻正确运行时，你会获得两个优势。一是您确切地知道您正在运行的芯片组。很多时候，当人们编译一段 C 代码时，他们必须编译它才能在某种通用的 x86 架构上运行。您获得的几乎所有二进制文件都没有特别针对其中任何一个进行过优化。您下载 Mozilla 的最新副本，它几乎可以在任何英特尔架构 CPU 上运行。几乎有一个 Linux 二进制文件。它非常通用，并且是用 GCC 编译的，这不是一个很好的 C 编译器。
当 HotSpot 运行时，它确切地知道您正在运行的芯片组。它确切地知道缓存是如何工作的。它确切地知道内存层次结构是如何工作的。它确切地知道所有管道互锁如何在 CPU 中工作。它知道这个芯片有哪些指令集扩展。它会针对您使用的机器进行精确优化。另一半是它实际上看到了正在运行的应用程序。它能够有统计数据，知道哪些事情是重要的。它能够内联 C 编译器永远无法做到的事情。在 Java 世界中内联的东西是非常惊人的。然后，您可以采用存储管理与现代垃圾收集器一起工作的方式。使用现代垃圾收集器，存储分配非常快。

Masterminds of Programming

编程大师

Answer 10

回答by Christopher Bekesi

Dimitry Leskov is absolutely right here.

迪米特里·莱斯科夫 (Dimitry Leskov) 就在这里。

All of the above is just theory of what could make JIT faster, implementing every scenaro is almost impossible. Besides, due to the fact that we only have a handful of different instruction sets on x86_64 CPUs there is very little to gain by targeting every instruction set on the current CPU. I always go by the rule of targeting x86_64 and SSE4.2 when building performance critical applications in native code. Java's fundamental structure is causing a ton of limitations, JNI can help you show just how inefficient it is, JIT is only sugarcoating this by making it overall faster. Besides the fact that every function by default is virtual, it also uses class types at runtime as opposed to for example C++. C++ has a great advantage here when it comes to performance, because no class object is required to be loaded at runtime, it's all blocks of data that gets allocated in memory, and only initialized when requested. In other words C++ doesn't have class types at runtime. Java classes are actual objects, not just templates. I'm not going to go into GC because that's irrelevant. Java strings are also slower because they use dynamic string pooling which would require runtime to do string searches in the pool table each time. Many of those things are due to the fact that Java wasn't first built to be fast, so its fundament will always be slow. Most native languages (primarily C/C++) was specifically built to be lean and mean, no waste of memory or resources. The first few versions of Java in fact were terribly slow and wasteful to memory, with lots of unnecessary meta data for variables and what not. As it is today, JIT being capable of producing faster code than AOT languages will remain a theory.

以上所有只是关于什么可以使 JIT 更快的理论，实现每个场景几乎是不可能的。此外，由于我们在 x86_64 CPU 上只有少数不同的指令集，因此针对当前 CPU 上的每个指令集几乎没有什么好处。在使用本机代码构建性能关键应用程序时，我总是遵循针对 x86_64 和 SSE4.2 的规则。Java 的基本结构造成了大量限制，JNI 可以帮助您展示它的效率有多低，JIT 只是通过使其整体更快来粉饰这一点。除了默认情况下每个函数都是虚拟的这一事实之外，它还在运行时使用类类型，而不是例如 C++。C++在性能方面有很大的优势，因为不需要在运行时加载类对象，它 ■ 在内存中分配的所有数据块，并且仅在请求时进行初始化。换句话说，C++ 在运行时没有类类型。Java 类是实际的对象，而不仅仅是模板。我不打算进入 GC，因为那无关紧要。Java 字符串也较慢，因为它们使用动态字符串池，这需要运行时每次在池表中进行字符串搜索。其中许多事情是由于 Java 最初并不是为了快速而构建的，因此它的基础总是很慢。大多数本地语言（主要是 C/C++）都是专门为精简而设计的，不会浪费内存或资源。实际上，Java 的前几个版本非常缓慢且浪费内存，有很多不必要的变量元数据等等。就像今天一样，

Think about all the work the JIT needs to keep track of to do the lazy JIT, increment a counter each time a function is called, check how many times it's been called.. so on and so forth. Running the JIT is taking a lot of time. The tradeof in my eyes is not worth it. This is just on PC

考虑 JIT 需要跟踪的所有工作来执行惰性 JIT，每次调用函数时增加一个计数器，检查它被调用了多少次......等等。运行 JIT 需要花费大量时间。在我看来，交易不值得。这只是在PC上

Ever tried to run Java on Raspberry and other embedded devices? Absolutely terrible performance. JavaFX on Raspberry? Not even functional... Java and its JIT is very far from meeting all of what it advertises and the theory people blindly spew out about it.

是否曾尝试在 Raspberry 和其他嵌入式设备上运行 Java？绝对糟糕的表现。树莓派上的 JavaFX？甚至不是功能性的...... Java 及其 JIT 远未满足它所宣传的所有内容以及人们盲目地谈论它的理论。

为什么 Java 在使用 JIT 时比编译为机器码更快？

提问by Adam Goode

采纳答案by Thorbj?rn Ravn Andersen

回答by Andrew Hare

回答by Sam Harwell

回答by Tal Pressman

回答by Luke Quinane

Lazy

懒惰的

Adaptive

自适应

回答by Dmitry Leskov

回答by Brendan Long

回答by gustafc

回答by Edwin Dalorzo

回答by Christopher Bekesi

相关推荐

最近更新

标签

为什么 Java 在使用 JIT 时比编译为机器码更快？

提问by Adam Goode

采纳答案by Thorbj?rn Ravn Andersen

回答by Andrew Hare

回答by Sam Harwell

回答by Tal Pressman

回答by Luke Quinane

Lazy

懒惰的

Adaptive

自适应

回答by Dmitry Leskov

回答by Brendan Long

回答by gustafc

回答by Edwin Dalorzo

回答by Christopher Bekesi

相关推荐

Java NetBeans GlassFish Server 4.0 在管理员端口上运行的另一台服务器失败

如何确定 Java 中通用字段的类型？

Java Android - 如何以编程方式截取屏幕截图

Java 使用 Thread.sleep 进行测试

相关推荐

最近更新

标签