java NUMA 架构如何影响 ActivePivot 的性能?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13160456/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 11:42:20  来源:igfitidea点击:

How does NUMA architecture affect the performance of ActivePivot?

javaolapnumaactivepivot

提问by Hyman

We are migrating an ActivePivot application to a new server (4 sockets Intel Xeon, 512GB of memory). After deploying we launched our application benchmark (that's a mix of large OLAP queries concurrent to real-time transactions). The measured performance is almost twice slower than on our previous server, that has similar processors but twice less cores and twice less memory.

我们正在将 ActivePivot 应用程序迁移到新服务器(4 插槽 Intel Xeon,512GB 内存)。部署后,我们启动了我们的应用程序基准测试(这是并发到实时事务的大型 OLAP 查询的混合)。测得的性能几乎比我们之前的服务器慢两倍,后者具有类似的处理器,但内核和内存少两倍。

We have investigated the differences between the two servers, and it appears the big one has a NUMA architecture(non uniform memory acccess). Each CPU socket is physically close to 1/4 of the memory, but further away from the rest of it... The JVM that runs our application allocates a large global heap, there is a random fraction of that heap on each NUMA node. Our analysis is that the memory access pattern is pretty random and CPU cores frequently waste time accessing remote memory.

我们调查了两台服务器之间的差异,看起来大服务器具有NUMA 架构(非统一内存访问)。每个 CPU 插槽在物理上接近内存的 1/4,但离它的其余部分更远......运行我们的应用程序的 JVM 分配一个大型全局堆,每个 NUMA 节点上都有该堆的随机部分。我们的分析是内存访问模式非常随机,CPU 内核经常浪费时间访问远程内存。

We are looking after more feedback about leveraging ActivePivot on NUMA severs. Can we configure ActivePivot cubes, or thread pools, change our queries, configure the operating system?

我们正在寻找有关在 NUMA 服务器上利用 ActivePivot 的更多反馈。我们可以配置 ActivePivot 多维数据集或线程池、更改我们的查询、配置操作系统吗?

回答by Antoine CHAMBILLE

Peter described the general JVM options available today to reduce the performance impact of NUMA architectures. To keep it short a NUMA aware JVM will partition the heap with respect to the NUMA nodes, and when a thread creates a new object, the object is allocated in the NUMA node of the core that runs that thread (if the same thread later uses it, the object will be in the local memory). Also when compacting the heap the NUMA aware JVM avoids moving large data chunks between nodes (and reduces the length of stop-the-world events).

Peter 描述了当今可用的通用 JVM 选项,以减少 NUMA 架构对性能的影响。为了保持简短,NUMA 感知 JVM 将根据 NUMA 节点对堆进行分区,并且当一个线程创建一个新对象时,该对象将分配在运行该线程的核心的 NUMA 节点中(如果同一线程稍后使用它,对象将在本地内存中)。此外,在压缩堆时,NUMA 感知 JVM 避免在节点之间移动大数据块(并减少 stop-the-world 事件的长度)。

So on any NUMA hardware and for any Java application the -XX:+UseNUMAoption should probably be enabled.

因此,在任何 NUMA 硬件和任何 Java 应用程序上,应该启用-XX:+UseNUMA选项。

But for ActivePivot that does not help much: ActivePivot is an in-memory database. There are real-time updates but the bulk of the data resides in the main memory for the life of the application. Whatever the JVM options, the data will be split among NUMA nodes, and the threads that execute queries will access memory randomly. Knowing that most sections of the ActivePivot query engine run as fast as memory can be fetched, the NUMA impact is particularly visible.

但是对于 ActivePivot 没有多大帮助:ActivePivot 是一个内存数据库。有实时更新,但大部分数据在应用程序的生命周期内都驻留在主内存中。无论 JVM 选项如何,数据都将在 NUMA 节点之间拆分,执行查询的线程将随机访问内存。知道 ActivePivot 查询引擎的大多数部分的运行速度与获取内存的速度一样快,因此 NUMA 影响尤为明显。

So how can you get the most from your ActivePivot solution on a NUMA hardware?

那么,如何在 NUMA 硬件上充分利用 ActivePivot 解决方案呢?

There is an easy solution when the ActivePivot application only uses a fraction of the resources (we find that it is often the case when several ActivePivot solutions run on the same server). For instance an ActivePivot solution that only uses 16 cores out of 64, and 256GB out of a TeraByte. In that case you can restrict the JVM process itself to a NUMA node.

当 ActivePivot 应用程序仅使用一小部分资源时,有一个简单的解决方案(我们发现在同一台服务器上运行多个 ActivePivot 解决方案时经常会出现这种情况)。例如,ActivePivot 解决方案仅使用 64 个内核中的 16 个内核,以及 1 兆字节中的 256GB。在这种情况下,您可以将 JVM 进程本身限制为一个 NUMA 节点。

On Linux you prefix the JVM launch with the following option ( http://linux.die.net/man/8/numactl):

在 Linux 上,您使用以下选项 ( http://linux.die.net/man/8/numactl)为 JVM 启动添加前缀:

numactl --cpunodebind=xxx

If the entire server is dedicated to one ActivePivot solution, you can leverage the ActivePivot Distributed Architecture to partition the data. If there are 4 NUMA nodes, you start 4 JVMs hosting 4 ActivePivot nodes, each one bound to its NUMA node. With this deployment queries are distributed among the nodes, and each node will perform its share of the work at max performance, within the right NUMA node.

如果整个服务器专用于一个 ActivePivot 解决方案,您可以利用 ActivePivot 分布式架构对数据进行分区。如果有 4 个 NUMA 节点,则启动 4 个托管 4 个 ActivePivot 节点的 JVM,每个节点都绑定到其 NUMA 节点。通过这种部署,查询分布在节点之间,每个节点将在正确的 NUMA 节点内以最高性能执行其工作份额。

回答by Peter Lawrey

You can try using -XX:+UseNUMA

您可以尝试使用 -XX:+UseNUMA

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

If this doesn't yield the result you expect you might have to use tasksetto lock a JVM to a specific socket and effectively break the server into four machines with one JVM each.

如果这没有产生您期望的结果,您可能必须使用taskset将 JVM 锁定到特定套接字并有效地将服务器分成四台机器,每台机器有一个 JVM。

I have observed that machine with more sockets have slower access to their memory (even their local memory) and how always give you the performance gains you want as a result.

我观察到具有更多套接字的机器对其内存(甚至是本地内存)的访问速度较慢,并且结果如何始终为您提供您想要的性能提升。