Linux “零拷贝网络”与“内核绕过”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18343365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"zero copy networking" vs "kernel bypass"?
提问by user997112
What is the difference between "zero-copy networking" and "kernel bypass"? Are they two phrases meaning the same thing, or different? Is kernel bypass a technique used within "zero copy networking" and this is the relationship?
“零拷贝网络”和“内核绕过”有什么区别?这两个短语是同一个意思还是不同的意思?内核绕过是“零拷贝网络”中使用的一种技术,这就是关系吗?
回答by nouney
Zero-copy networking
零拷贝网络
You're doing zero-copy networking when you never copy the data between the user-space and the kernel-space (I mean memory space). By example:
当您从未在用户空间和内核空间(我的意思是内存空间)之间复制数据时,您正在进行零复制网络。举例:
C language
recv(fd, buffer, BUFFER_SIZE, 0);
C语言
recv(fd, buffer, BUFFER_SIZE, 0);
By default the data are copied:
默认情况下,数据被复制:
- The kernel gets the data from the network stack
- The kernel copies this data to the
buffer
, which is in the user-space.
- 内核从网络栈中获取数据
- 内核将此数据复制到
buffer
用户空间中的 。
With zero-copy method, the data are not copied and come to the user-space directly from the network stack.
使用零复制方法,数据不会被复制并直接从网络堆栈到达用户空间。
Kernel Bypass
内核绕过
The kernel bypass is when you manage yourself, in the user-space, the network stack and hardware stuff. It is hard, but you will gain a lot of performance (there is zero copy, since all the data are in the user-space). This linkcould be interesting if you want more information.
内核绕过是当您在用户空间、网络堆栈和硬件内容中管理自己时。这很难,但您将获得很多性能(零拷贝,因为所有数据都在用户空间中)。如果您想了解更多信息,此链接可能会很有趣。
回答by artless noise
What is the difference between "zero-copy networking" and "kernel bypass"? Are they two phrases meaning the same thing, or different? Is kernel bypass a technique used within "zero copy networking" and this is the relationship?
“零拷贝网络”和“内核绕过”有什么区别?这两个短语是同一个意思还是不同的意思?内核绕过是“零拷贝网络”中使用的一种技术,这就是关系吗?
TL;DR - They are different concepts, but it is quite likely that zero copy is supported within kernel bypass API/framework.
TL;DR - 它们是不同的概念,但很可能在内核绕过 API/框架中支持零拷贝。
User Bypass
用户绕过
This mode of communicating should also be considered. It maybe possible for DMA-to-DMAtransactions which do not involve the CPU at all. The idea is to use splice()
or similar functions to avoid user spaceat all. Note, that with splice()
, the entire data stream does not need to bypass user space. Headers can be read in user space and data streamed directly to disk. The most common downfall of this is splice()
doesn't do checksum offloading.
还应考虑这种通信方式。对于完全不涉及 CPU 的DMA 到 DMA事务,可能是可能的。这个想法是使用splice()
或类似的功能来完全避免用户空间。请注意,使用splice()
,整个数据流不需要绕过用户空间。标头可以在用户空间中读取,数据可以直接流式传输到磁盘。最常见的缺点是splice()
不进行校验和卸载。
Zero copy
零拷贝
The zero copyconcept is only that the network buffers are fixed in place and are not moved around. In many cases, this is not really beneficial. Most modern network hardwaresupports scatter gather, also know as buffer descriptors, etc. The idea is the network hardwareunderstands physical pointers. The buffer descriptor typically consists of,
该零拷贝的概念仅仅是网络缓冲区固定在适当位置,而不是到处移动。在许多情况下,这并不是真正有益的。大多数现代网络硬件支持分散收集,也称为缓冲区描述符等。这个想法是网络硬件理解物理指针。缓冲区描述符通常包括,
- Data pointer
- Length
- Next buffer descriptor
- 数据指针
- 长度
- 下一个缓冲区描述符
The benefit is that the network headers do not need to exist side-by-sideand IP, TCP, and Applicationheaders can reside physically seperate from the application data.
好处是网络标头不需要并排存在,并且IP、TCP和应用标头可以与应用程序数据在物理上分离。
If a controller doesn't support this, then the TCP/IPheaders must precede the user dataso that they can be filled in before sending to the network controller.
如果控制器不支持此功能,则TCP/IP标头必须位于用户数据之前,以便在发送到网络控制器之前填充它们。
zero copyalso implies some kernel-user MMU setup so that pages are shared.
零拷贝还意味着一些内核用户 MMU 设置,以便共享页面。
Kernel Bypass
内核绕过
Of course, you can bypass the kernel. This is what pcapand other sniffer software has been doing for some time. However, it is difficult to see a case where user spacewill have a definite win unless it is tied to the particular hardware. Some network controllersmay have scatter gathersupported in the controller and others may not.
当然,你可以绕过内核。这就是pcap和其他嗅探器软件一段时间以来一直在做的事情。但是,除非与特定硬件相关联,否则很难看到用户空间会取得明显胜利的情况。某些网络控制器可能在控制器中支持分散收集,而其他网络控制器可能不支持。
There are various incarnation of kernel interfaces to accomplish kernel by-pass.
有多种内核接口的化身来完成内核旁路。
To put this together...
把这个放在一起...
Are they two phrases meaning the same thing, or different?
这两个短语是同一个意思还是不同的意思?
They are different as above hopefully explains.
正如上面希望解释的那样,它们是不同的。
Is kernel bypass a technique used within "zero copy networking" and this is the relationship?
内核绕过是“零拷贝网络”中使用的一种技术,这就是关系吗?
It is the opposite. Kernel bypass can use zero copyand most likely will support it as the buffers are completely under control of the application. Also, there is no memory sharing between the kernel and user space (meaning no need for MMU shared pages and whatever cache/TLB effects that may cause). So if you are using kernel bypass, it will often be advantageous to support zero copy; so the things may seem the same at first.
恰恰相反。内核绕过可以使用零拷贝,并且很可能会支持它,因为缓冲区完全在应用程序的控制之下。此外,内核和用户空间之间没有内存共享(意味着不需要 MMU 共享页面以及可能导致的任何缓存/TLB 影响)。所以如果你使用内核旁路,支持零拷贝通常是有利的;所以一开始可能看起来是一样的。
If scatter-gather DMA is available (most modern controllers) either user space or the kernel can use it. zero copyis not as useful in this case.
如果 scatter-gather DMA 可用(大多数现代控制器),用户空间或内核都可以使用它。在这种情况下,零拷贝不是很有用。
Reference:
参考:
- Technical reference on OnLoad, a high band width kernel by-pass system.
- PF Ringas of 2.6.32, if configured
- Linux kernel network buffer managementby David Miller. This gives an idea of how the protocols headers/trailers are managed in the kernel.
- OnLoad 技术参考,高带宽内核旁路系统。
- PF 环自 2.6.32 起(如果已配置)
- David Miller 的Linux 内核网络缓冲区管理。这给出了如何在内核中管理协议头/尾的想法。
回答by Lee Ballard
Other examples of kernel bypass and zero copy are DPDK and RDMA. When an application uses DPDK it is bypassing the kernel TCP/IP stack. The application is creating the Ethernet frames and the NIC grabbing those frames with DMA directly from user space memory so it's zero copy because there is no copy from user space to kernel space. Applications can do similar things with RDMA. The application writes to queue pairs that the NIC directly access and transmits. RDMA iblibverbs is used inside the kernel as well so when iSER is using RDMA it's not Kernel bypass but it is zero copy.
内核绕过和零拷贝的其他示例是 DPDK 和 RDMA。当应用程序使用 DPDK 时,它会绕过内核 TCP/IP 堆栈。该应用程序正在创建以太网帧,并且 NIC 直接从用户空间内存中使用 DMA 抓取这些帧,因此它是零复制,因为没有从用户空间到内核空间的复制。应用程序可以使用 RDMA 做类似的事情。应用程序写入 NIC 直接访问和传输的队列对。RDMA iblibverbs 也在内核内部使用,因此当 iSER 使用 RDMA 时,它不是内核旁路,而是零拷贝。
https://www.openfabrics.org/index.php/openfabrics-software.html
https://www.openfabrics.org/index.php/openfabrics-software.html
回答by Tony Tannous
ZERO-COPY:
零拷贝:
When transmitting and receiving packets, all packet data must be copied from user-spacebuffers to kernel-spacebuffers for transmitting and vice versa for receiving. A zero-copydriver avoids this by having user space and the driver share packet buffer memory directly.
在发送和接收数据包时,必须将所有数据包数据从用户空间缓冲区复制到内核空间缓冲区以进行发送,反之亦然以进行接收。甲零拷贝驱动器由具有用户空间和直接驱动共享数据包缓冲存储器避免了这种。
Instead of having the transmitand receivepoint to buffers in kernel space which will later require to copy, a region of memory in user space is allocated, and mapped to a given region of physical memory, to be shared memory between the kernel buffers and the user-space buffers, then point each descriptor buffer to its corresponding place in the newly allocated memory.
不是让发送和接收指向内核空间中的缓冲区,稍后需要复制,而是分配用户空间中的内存区域,并映射到物理内存的给定区域,作为内核缓冲区和内核缓冲区之间的共享内存用户空间缓冲区,然后将每个描述符缓冲区指向新分配的内存中的相应位置。