Java 和 C/C++ 之间进程间通信的最快（低延迟）方法

Question

提问by Bastien

I have a Java app, connecting through TCP socket to a "server" developed in C/C++.

我有一个 Java 应用程序，通过 TCP 套接字连接到用 C/C++ 开发的“服务器”。

both app & server are running on the same machine, a Solaris box (but we're considering migrating to Linux eventually). type of data exchanged is simple messages (login, login ACK, then client asks for something, server replies). each message is around 300 bytes long.

应用程序和服务器都在同一台机器上运行，一个 Solaris 机器（但我们正在考虑最终迁移到 Linux）。交换的数据类型是简单的消息（登录，登录 ACK，然后客户端请求某些东西，服务器回复）。每条消息大约有 300 字节长。

Currently we're using Sockets, and all is OK, however I'm looking for a faster way to exchange data (lower latency), using IPC methods.

目前我们正在使用套接字，一切都很好，但是我正在寻找一种使用 IPC 方法交换数据的更快方法（更低的延迟）。

I've been researching the net and came up with references to the following technologies:

我一直在研究网络，并提出了对以下技术的参考：

shared memory
pipes
queues
as well as what's referred as DMA (Direct Memory Access)

共享内存
管道
队列
以及所谓的 DMA（直接内存访问）

but I couldn't find proper analysis of their respective performances, neither how to implement them in both JAVA and C/C++ (so that they can talk to each other), except maybe pipes that I could imagine how to do.

但是我找不到对它们各自性能的正确分析，也找不到如何在 JAVA 和 C/C++ 中实现它们（以便它们可以相互交谈），除了我可以想象如何做的管道。

can anyone comment about performances & feasibility of each method in this context ? any pointer / link to useful implementation information ?

在这种情况下，任何人都可以评论每种方法的性能和可行性吗？任何指向有用实现信息的指针/链接？

EDIT / UPDATE

编辑/更新

following the comment & answers I got here, I found info about Unix Domain Sockets, which seem to be built just over pipes, and would save me the whole TCP stack. it's platform specific, so I plan on testing it with JNI or either judsor junixsocket.

按照我在这里得到的评论和答案，我找到了有关 Unix Domain Sockets 的信息，它似乎是通过管道构建的，并且可以为我节省整个 TCP 堆栈。它是特定于平台的，因此我计划使用 JNI 或juds或junixsocket 对其进行测试。

next possible steps would be direct implementation of pipes, then shared memory, although I've been warned of the extra level of complexity...

下一个可能的步骤是直接实现管道，然后共享内存，尽管我已经被警告过额外的复杂性......

thanks for your help

感谢您的帮助

Answer 1

采纳答案by Andriy

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

刚刚在我的 Corei5 2.8GHz 上测试了 Java 的延迟，仅发送/接收单字节，刚刚产生 2 个 Java 进程，没有为任务集分配特定的 CPU 内核：

TCP         - 25 microseconds
Named pipes - 15 microseconds

Now explicitly specifying core masks, like taskset 1 java Srvor taskset 2 java Cli:

现在明确指定核心掩码，例如taskset 1 java Srv或taskset 2 java Cli：

TCP, same cores:                      30 microseconds
TCP, explicit different cores:        22 microseconds
Named pipes, same core:               4-5 microseconds !!!!
Named pipes, taskset different cores: 7-8 microseconds !!!!

so

所以

TCP overhead is visible
scheduling overhead (or core caches?) is also the culprit

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

同时 Thread.sleep(0)（如 strace 所示导致执行单个 sched_yield() Linux 内核调用）需要 0.3 微秒 - 因此调度到单核的命名管道仍然有很多开销

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport.http://solacesystems.com/news/fastest-ipc-messaging/

一些共享内存测量： 2009 年 9 月 14 日 – Solace Systems 今天宣布其统一消息平台 API 可以使用共享内存传输实现低于 700 纳秒的平均延迟。http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

PS - 第二天尝试以内存映射文件的形式共享内存，如果忙等待是可以接受的，我们可以将传递单个字节的延迟减少到 0.3 微秒，代码如下：

MappedByteBuffer mem =
  new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
  .map(FileChannel.MapMode.READ_WRITE, 0, 1);

while(true){
  while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
  mem.put(0, (byte)10); // sending the reply
}

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

注意：需要 Thread.sleep(0) 以便 2 个进程可以看到彼此的更改（我还不知道其他方法）。如果 2 个进程强制使用任务集使用相同的核心，则延迟变为 1.5 微秒 - 这是上下文切换延迟

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

PPS - 0.3 微秒是一个不错的数字！下面的代码只需要 0.1 微秒，而只进行原始字符串连接：

int j=123456789;
String ret = "my-record-key-" + j  + "-in-db";

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

PPPS - 希望这不是太多题外话，但最后我尝试用增加静态易失性 int 变量（JVM 这样做时碰巧刷新 CPU 缓存）替换 Thread.sleep(0) 并获得 - 记录！- 72 纳秒延迟 java 到 java 进程通信！

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

然而，当强制使用相同的 CPU 核心时，易失性递增的 JVM 永远不会相互控制，因此产生恰好 10 毫秒的延迟——Linux 时间量子似乎是 5 毫秒......所以这应该只在有空闲核心时使用——否则 sleep(0) 更安全。

Answer 2

回答by Seffi

In my former company we used to work with this project, http://remotetea.sourceforge.net/, very easy to understand and integrate.

在我以前的公司，我们曾经使用过这个项目，http://remotetea.sourceforge.net/，非常容易理解和集成。

Answer 3

回答by fish

I don't know much about native inter-process communication, but I would guess that you need to communicate using native code, which you can access using JNI mechanisms. So, from Java you would call a native function that talks to the other process.

我对本机进程间通信了解不多，但我猜您需要使用本机代码进行通信，您可以使用 JNI 机制访问这些代码。因此，从 Java 中，您将调用与其他进程对话的本机函数。

Answer 4

回答by bakkal

If you ever consider using native access (since both your application and the "server" are on the same machine), consider JNA, it has less boilerplate code for you to deal with.

如果您曾经考虑使用本机访问（因为您的应用程序和“服务器”都在同一台机器上），请考虑JNA，它有更少的样板代码供您处理。

Answer 5

回答by MSalters

DMA is a method by which hardware devices can access physical RAM without interrupting the CPU. E.g. a common example is a harddisk controller which can copy bytes straight from disk to RAM. As such it's not applicable to IPC.

DMA 是一种硬件设备可以在不中断 CPU 的情况下访问物理 RAM 的方法。例如，一个常见的例子是硬盘控制器，它可以将字节直接从磁盘复制到 RAM。因此它不适用于 IPC。

Shared memory and pipes are both supported directly by modern OSes. As such, they're quite fast. Queues are typically abstractions, e.g. implemented on top of sockets, pipes and/or shared memory. This may look like a slower mechanism, but the alternative is that youcreate such an abstraction.

现代操作系统直接支持共享内存和管道。因此，它们的速度非常快。队列通常是抽象的，例如在套接字、管道和/或共享内存之上实现。这可能看起来是一种较慢的机制，但替代方案是您创建这样的抽象。

Answer 6

回答by sustrik

Here's a project containing performance tests for various IPC transports:

这是一个包含各种 IPC 传输性能测试的项目：

http://github.com/rigtorp/ipc-bench

Answer 7

回答by Thorbj?rn Ravn Andersen

Have you considered keeping the sockets open, so the connections can be reused?

您是否考虑过保持套接字打开，以便可以重用连接？

Answer 8

回答by Steve-o

Oracle bug report on JNI performance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4096069

关于 JNI 性能的 Oracle 错误报告：http: //bugs.java.com/bugdatabase/view_bug.do?bug_id= 4096069

JNI is a slow interface and so Java TCP sockets are the fastest method for notification between applications, however that doesn't mean you have to send the payload over a socket. Use LDMA to transfer the payload, but as previous questionshave pointed out, Java support for memory mapping is not ideal and you so will want to implement a JNI library to run mmap.

JNI 是一个慢速接口，因此 Java TCP 套接字是应用程序之间通知的最快方法，但这并不意味着您必须通过套接字发送有效负载。使用 LDMA 传输有效负载，但正如前面的问题所指出的，Java 对内存映射的支持并不理想，因此您需要实现一个 JNI 库来运行 mmap。

Answer 9

回答by Peter Lawrey

The question was asked some time ago, but you might be interested in https://github.com/peter-lawrey/Java-Chroniclewhich supports typical latencies of 200 ns and throughputs of 20 M messages/second. It uses memory mapped files shared between processes (it also persists the data which makes it fastest way to persist data)

前段时间问过这个问题，但您可能对https://github.com/peter-lawrey/Java-Chronicle感兴趣，它支持 200 ns 的典型延迟和 20 M 消息/秒的吞吐量。它使用进程之间共享的内存映射文件（它还持久化数据，这使其成为持久化数据的最快方式）

Answer 10

回答by Nitsan Wakart

A late arrival, but wanted to point out an open source projectdedicated to measuring ping latency using Java NIO.

迟到了，但想指出一个开源项目，专门用于使用 Java NIO 测量 ping 延迟。

Further explored/explained in this blog post. The results are(RTT in nanos):

在这篇博文中进一步探讨/解释。结果是（以纳米为单位的RTT）：

Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877

This is along the lines of the accepted answer. System.nanotime() error (estimated by measuring nothing) is measured at around 40 nanos so for the IPC the actual result might be lower. Enjoy.

这与已接受的答案一致。System.nanotime() 误差（通过不测量来估计）大约为 40 纳秒，因此对于 IPC，实际结果可能更低。享受。

Java 和 C/C++ 之间进程间通信的最快（低延迟）方法

提问by Bastien

采纳答案by Andriy

回答by Seffi

回答by fish

回答by bakkal

回答by MSalters

回答by sustrik

回答by Thorbj?rn Ravn Andersen

回答by Steve-o

回答by Peter Lawrey

回答by Nitsan Wakart

相关推荐

最近更新

标签

Java 和 C/C++ 之间进程间通信的最快（低延迟）方法

提问by Bastien

采纳答案by Andriy

回答by Seffi

回答by fish

回答by bakkal

回答by MSalters

回答by sustrik

回答by Thorbj?rn Ravn Andersen

回答by Steve-o

回答by Peter Lawrey

回答by Nitsan Wakart

相关推荐

JNI如何访问Java对象（整数）

Java数组返回最高数组索引

Java 一个正则表达式来匹配后面没有某个其他子字符串的子字符串

遇到过时的 javax.persistence.spi.PersistenceProvider

相关推荐

最近更新

标签