如何监控 Linux UDP 缓冲区的可用空间？

Question

提问by Yoni Roit

I have a java app on linux which opens UDP socket and waits for messages.

我在 linux 上有一个 java 应用程序，它打开 UDP 套接字并等待消息。

After couple of hours under heavy load, there is a packet loss, i.e. the packets are received by kernel but not by my app (we see the lost packets in sniffer, we see UDP packets lost in netstat, we don't see those packets in our app logs).

在高负载下几个小时后，有一个丢包，即数据包被内核接收但我的应用程序没有收到（我们在嗅探器中看到丢失的数据包，我们在 netstat 中看到 UDP 数据包丢失，我们没有看到那些数据包在我们的应用程序日志中）。

We tried enlarging socket buffers but this didnt help - we started losing packets later then before, but that's it.

我们尝试扩大套接字缓冲区，但这没有帮助 - 我们后来开始丢失数据包，但仅此而已。

For debugging, I want to know how full the OS udp buffer is, at any given moment. Googled, but didn't find anything. Can you help me?

对于调试，我想知道在任何给定时刻操作系统 udp 缓冲区的满度。谷歌搜索，但没有找到任何东西。你能帮助我吗？

P.S. Guys, I'm aware that UDP is unreliable. However - my computer receives all UDP messages, while my app is unable to consume some of them. I want to optimize my app to the max, that's the reason for the question. Thanks.

PS 伙计们，我知道 UDP 是不可靠的。但是 - 我的计算机接收所有 UDP 消息，而我的应用程序无法使用其中的一些。我想最大限度地优化我的应用程序，这就是问题的原因。谢谢。

Answer 1

采纳答案by Juliano

Linux provides the files /proc/net/udpand /proc/net/udp6, which lists all open UDP sockets (for IPv4 and IPv6, respectively). In both of them, the columns tx_queueand rx_queueshow the outgoing and incoming queues in bytes.

Linux 提供了文件/proc/net/udp和/proc/net/udp6，其中列出了所有打开的 UDP 套接字（分别用于 IPv4 和 IPv6）。在这两个列中，列tx_queue和rx_queue以字节为单位显示传出和传入队列。

If everything is working as expected, you usually will not see any value different than zero in those two columns: as soon as your application generates packets they are sent through the network, and as soon those packets arrive from the network your application will wake up and receive them (the recvcall immediately returns). You may see the rx_queuego up if your application has the socket open but is not invoking recvto receive the data, or if it is not processing such data fast enough.

如果一切都按预期工作，您通常不会在这两列中看到任何不同于零的值：一旦您的应用程序生成数据包，它们就会通过网络发送，并且一旦这些数据包从网络到达，您的应用程序就会被唤醒并接收它们（recv呼叫立即返回）。rx_queue如果您的应用程序打开了套接字但没有调用recv接收数据，或者它处理此类数据的速度不够快，您可能会看到上升。

Answer 2

回答by David Schwartz

The process is simple:

过程很简单：

If desired, pause the application process.
Open the UDP socket. You can snag it from the running process using /proc/<PID>/fdif necessary. Or you can add this code to the application itself and send it a signal -- it will already have the socket open, of course.
Call recvmsgin a tight loop as quickly as possible.
Count how many packets/bytes you got.

如果需要，暂停申请过程。
打开 UDP 套接字。/proc/<PID>/fd如有必要，您可以使用它从正在运行的进程中获取它。或者您可以将此代码添加到应用程序本身并向其发送一个信号——当然，它已经打开了套接字。
调用recvmsg在紧密循环尽快。
计算您获得了多少数据包/字节。

This will discard any datagrams currently buffered, but if that breaks your application, your application was already broken.

这将丢弃当前缓冲的任何数据报，但如果这破坏了您的应用程序，则您的应用程序已经损坏。

Answer 3

回答by Anne

rx_queue will tell you the queue length at any given instant, but it will not tell you how full the queue has been, i.e. the highwater mark. There is no way to constantly monitor this value, and no way to get it programmatically (see How do I get amount of queued data for UDP socket?).

rx_queue 会告诉你任何给定时刻的队列长度，但它不会告诉你队列有多满，即高水位线。无法持续监视此值，也无法以编程方式获取它（请参阅如何获取 UDP 套接字的排队数据量？）。

The only way I can imagine monitoring the queue length is to move the queue into your own program. In other words, start two threads -- one is reading the socket as fast as it can and dumping the datagrams into your queue; and the other one is your program pulling from this queue and processing the packets. This of course assumes that you can assure each thread is on a separate CPU. Now you can monitor the length of your own queue and keep track of the highwater mark.

我可以想象监控队列长度的唯一方法是将队列移动到您自己的程序中。换句话说，启动两个线程——一个是尽可能快地读取套接字并将数据报转储到您的队列中；另一个是你的程序从这个队列中提取并处理数据包。这当然假设您可以确保每个线程都在单独的 CPU 上。现在，您可以监控自己队列的长度并跟踪高水位线。

Answer 4

回答by RickS

UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!

UDP 是一个完全可行的协议。对于正确的工作，正确的工具也是同样的老案例！

If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.

如果您有一个程序等待 UDP 数据报，然后在返回等待另一个数据报之前开始处理它们，那么您的处理时间需要始终比最坏情况下的数据报到达率更快。如果不是，则 UDP 套接字接收队列将开始填满。

This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.

这对于短脉冲是可以容忍的。队列做它应该做的事情——排队数据报直到你准备好。但是如果平均到达率经常导致队列中的积压，那么是时候重新设计你的程序了。这里有两个主要选择：通过巧妙的编程技术减少处理时间，和/或多线程程序。也可以在您的程序的多个实例之间使用负载平衡。

As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I catthe /proc/net/udpnode, I get something like this:

如前所述，在 Linux 上，您可以检查 proc 文件系统以获取有关 UDP 的状态。例如，如果我cat是/proc/net/udp节点，我会得到如下信息：

$ cat /proc/net/udp   
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
  40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3466 2 ffff88013abc8340 0           
  67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000  1006        0 16940862 2 ffff88013abc9040 2237    
 122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000  1006        0 912865 2 ffff88013abc8d00 0

From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.

从这里，我可以看到用户 ID 1006 拥有的套接字正在侦听端口 0x231D (8989) 并且接收队列大约为 128KB。由于 128KB 是我系统上的最大大小，这告诉我我的程序在跟上到达的数据报方面非常薄弱。到目前为止已经有 2237 次丢弃，这意味着 UDP 层不能再将任何数据报放入套接字队列，并且必须丢弃它们。

You could watch your program's behaviour over time e.g. using:

您可以随着时间的推移观察程序的行为，例如使用：

watch -d 'cat /proc/net/udp|grep 00000000:231D'

Note also that the netstat command does about the same thing: netstat -c --udp -an

另请注意， netstat 命令执行相同的操作： netstat -c --udp -an

My solution for my weenie program, will be to multi-thread.

我的小程序的解决方案是多线程。

Cheers!

干杯!

如何监控 Linux UDP 缓冲区的可用空间？

提问by Yoni Roit

采纳答案by Juliano

回答by David Schwartz

回答by Anne

回答by RickS

相关推荐

最近更新

标签

如何监控 Linux UDP 缓冲区的可用空间？

提问by Yoni Roit

采纳答案by Juliano

回答by David Schwartz

回答by Anne

回答by RickS

相关推荐

如何在 Linux 中循环目录？

在 C# 中创建午夜日期时间的最佳方法

Linux 如何将当前 git 分支的名称放入 shell 脚本中的变量中？

在 Unix/Linux 中提取 Jar 的最佳方法？

相关推荐

最近更新

标签