C# Begin/EndReceive - 如何读取大数据？

Question

提问by ryeguy

When reading data in chunks of say, 1024, how do I continue to read from a socket that receives a message bigger than 1024 bytes until there is no data left? Should I just use BeginReceive to read a packet's length prefix only, and then once that is retrieved, use Receive() (in the async thread) to read the rest of the packet? Or is there another way?

当以 1024 的块读取数据时，如何继续从接收大于 1024 字节的消息的套接字中读取，直到没有数据为止？我应该只使用 BeginReceive 来读取数据包的长度前缀，然后一旦检索到，使用 Receive()（在异步线程中）读取数据包的其余部分？或者还有其他方法吗？

edit:

编辑：

I thought Jon Skeet's link had the solution, but there is a bit of a speedbump with that code. The code I used is:

我认为 Jon Skeet 的链接有解决方案，但该代码有一些障碍。我使用的代码是：

public class StateObject
{
    public Socket workSocket = null;
    public const int BUFFER_SIZE = 1024;
    public byte[] buffer = new byte[BUFFER_SIZE];
    public StringBuilder sb = new StringBuilder();
}

public static void Read_Callback(IAsyncResult ar)
{
    StateObject so = (StateObject) ar.AsyncState;
    Socket s = so.workSocket;

    int read = s.EndReceive(ar);

    if (read > 0) 
    {
        so.sb.Append(Encoding.ASCII.GetString(so.buffer, 0, read));

        if (read == StateObject.BUFFER_SIZE)
        {
            s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0, 
                    new AyncCallback(Async_Send_Receive.Read_Callback), so);
            return;
        }
    }

    if (so.sb.Length > 0)
    {
        //All of the data has been read, so displays it to the console
        string strContent;
        strContent = so.sb.ToString();
        Console.WriteLine(String.Format("Read {0} byte from socket" + 
        "data = {1} ", strContent.Length, strContent));
    }
    s.Close();
}

Now this corrected works fine most of the time, but it fails when the packet's size is a multiple of the buffer. The reason for this is if the buffer gets filled on a read it is assumed there is more data; but the same problem happens as before. A 2 byte buffer, for exmaple, gets filled twice on a 4 byte packet, and assumes there is more data. It then blocks because there is nothing left to read. The problem is that the receive function doesn't know when the end of the packet is.

现在，此更正在大多数情况下都可以正常工作，但是当数据包的大小是 buffer 的倍数时它会失败。这样做的原因是如果缓冲区在读取时被填满，则假定有更多数据；但是和以前一样发生了同样的问题。例如，一个 2 字节的缓冲区在一个 4 字节的数据包上被填充两次，并假设有更多的数据。然后它会阻塞，因为没有任何东西可以读取。问题是接收函数不知道数据包何时结束。

This got me thinking to two possible solutions: I could either have an end-of-packet delimiter or I could read the packet header to find the length and then receive exactly that amount (as I originally suggested).

这让我想到了两种可能的解决方案：我可以使用数据包结束分隔符，或者我可以读取数据包头以找到长度，然后准确接收该数量（正如我最初建议的那样）。

There's problems with each of these, though. I don't like the idea of using a delimiter, as a user could somehow work that into a packet in an input string from the app and screw it up. It also just seems kinda sloppy to me.

但是，这些都存在问题。我不喜欢使用分隔符的想法，因为用户可以以某种方式将其放入应用程序输入字符串中的数据包中并将其搞砸。这对我来说似乎也有点草率。

The length header sounds ok, but I'm planning on using protocol buffers - I don't know the format of the data. Is there a length header? How many bytes is it? Would this be something I implement myself? Etc..

长度标头听起来不错，但我计划使用协议缓冲区 - 我不知道数据的格式。有长度标题吗？它是多少字节？这会是我自己实现的东西吗？等等..

What should I do?

我该怎么办？

Answer 1

采纳答案by Jon Skeet

No - call BeginReceiveagain from the callback handler, until EndReceivereturns 0. Basically, you should keep on receiving asynchronously, assuming you want the fullest benefit of asynchronous IO.

否 -BeginReceive从回调处理程序再次调用，直到EndReceive返回 0。基本上，您应该继续异步接收，假设您想要异步 IO 的最大好处。

If you look at the MSDN page for Socket.BeginReceiveyou'll see an example of this. (Admittedly it's not as easy to follow as it might be.)

如果您查看 MSDN 页面，Socket.BeginReceive您将看到一个示例。（诚然，它并不像它可能那样容易遵循。）

Answer 2

回答by casperOne

You would read the length prefix first. Once you have that, you would just keep reading the bytes in blocks (and you can do this async, as you surmised) until you have exhausted the number of bytes you know are coming in off the wire.

您将首先阅读长度前缀。一旦你有了它，你就可以继续读取块中的字节（并且你可以像你猜测的那样异步执行），直到你用尽你知道从网络中传入的字节数。

Note that at some point, when reading the last block you won't want to read the full 1024 bytes, depending on what the length-prefix says the total is, and how many bytes you have read.

请注意，在某些时候，当读取最后一个块时，您不会想要读取完整的 1024 个字节，这取决于长度前缀所说的总数是多少，以及您读取了多少字节。

Answer 3

回答by Marc Gravell

For info (general Begin/End usage), you might want to see this blog post; this approach is working OK for me, and saving much pain...

有关信息（一般开始/结束用法），您可能希望查看此博客文章；这种方法对我来说很有效，并且可以节省很多痛苦......

Answer 4

回答by Matt Davis

Dang. I'm hesitant to even reply to this given the dignitaries that have already weighed in, but here goes. Be gentle, O Great Ones!

党。考虑到已经权衡过的政要，我什至不愿回答这个问题，但这里是。温柔点吧，伟大的人们！

Without having the benefit of reading Marc's blog (it's blocked here due the corporate internet policy), I'm going to offer "another way."

没有阅读 Marc 博客的好处（由于公司互联网政策，它在此处被屏蔽），我将提供“另一种方式”。

The trick, in my mind, is to separate the receipt of the data from the processing of that data.

在我看来，诀窍是将数据的接收与数据的处理分开。

I use a StateObject class defined like this. It differs from the MSDN StateObject implementation in that it does not include the StringBuilder object, the BUFFER_SIZE constant is private, and it includes a constructor for convenience.

我使用这样定义的 StateObject 类。它与 MSDN StateObject 实现的不同之处在于它不包括 StringBuilder 对象，BUFFER_SIZE 常量是私有的，并且为方便起见，它包括一个构造函数。

public class StateObject
{
    private const int BUFFER_SIZE = 65535;
    public byte[] Buffer = new byte[BUFFER_SIZE];
    public readonly Socket WorkSocket = null;

    public StateObject(Socket workSocket)
    {
        WorkSocket = workSocket;
    }
}

I also have a Packet class that is simply a wrapper around a buffer and a timestamp.

我还有一个 Packet 类，它只是一个缓冲区和时间戳的包装器。

public class Packet
{
    public readonly byte[] Buffer;
    public readonly DateTime Timestamp;

    public Packet(DateTime timestamp, byte[] buffer, int size)
    {
        Timestamp = timestamp;
        Buffer = new byte[size];
        System.Buffer.BlockCopy(buffer, 0, Buffer, 0, size);
    }
}

My ReceiveCallback() function looks like this.

我的 ReceiveCallback() 函数看起来像这样。

public static ManualResetEvent PacketReceived = new ManualResetEvent(false);
public static List<Packet> PacketList = new List<Packet>();
public static object SyncRoot = new object();
public static void ReceiveCallback(IAsyncResult ar)
{
    try {
        StateObject so = (StateObject)ar.AsyncState;
        int read = so.WorkSocket.EndReceive(ar);

        if (read > 0) {
            Packet packet = new Packet(DateTime.Now, so.Buffer, read);
            lock (SyncRoot) {
                PacketList.Add(packet);
            }
            PacketReceived.Set();
        }

        so.WorkSocket.BeginReceive(so.Buffer, 0, so.Buffer.Length, 0, ReceiveCallback, so);
    } catch (ObjectDisposedException) {
        // Handle the socket being closed with an async receive pending
    } catch (Exception e) {
        // Handle all other exceptions
    }
}

Notice that this implementation does absolutely no processing of the received data, nor does it have any expections as to how many bytes are supposed to have been received. It simply receives whatever data happens to be on the socket (up to 65535 bytes) and stores that data in the packet list, and then it immediately queues up another asynchronous receive.

请注意，此实现绝对不处理接收到的数据，也不对应该接收多少字节有任何期望。它只是接收套接字上发生的任何数据（最多 65535 字节）并将该数据存储在数据包列表中，然后立即将另一个异步接收排队。

Since processing no longer occurs in the thread that handles each asynchronous receive, the data will obviously be processed by a differentthread, which is why the Add() operation is synchronized via the lock statement. In addition, the processing thread (whether it is the main thread or some other dedicated thread) needs to know whenthere is data to process. To do this, I usually use a ManualResetEvent, which is what I've shown above.

由于处理不再发生在处理每个异步接收的线程中，因此数据显然会由不同的线程处理，这就是为什么 Add() 操作通过 lock 语句进行同步的原因。另外，处理线程（无论是主线程还是其他一些专用线程）需要知道什么时候有数据要处理。为此，我通常使用 ManualResetEvent，这就是我上面显示的内容。

Here is how the processing works.

以下是处理的工作原理。

static void Main(string[] args)
{
    Thread t = new Thread(
        delegate() {
            List<Packet> packets;
            while (true) {
                PacketReceived.WaitOne();
                PacketReceived.Reset();
                lock (SyncRoot) {
                    packets = PacketList;
                    PacketList = new List<Packet>();
                }

                foreach (Packet packet in packets) {
                    // Process the packet
                }
            }
        }
    );
    t.IsBackground = true;
    t.Name = "Data Processing Thread";
    t.Start();
}

That's the basic infrastructure I use for all of my socket communication. It provides a nice separation between the receipt of the data and the processing of that data.

这是我用于所有套接字通信的基本基础结构。它在数据的接收和数据的处理之间提供了很好的分离。

As to the other question you had, it is important to remember with this approach that each Packet instance does not necessarily represent a complete message within the context of your application. A Packet instance might contain a partial message, a single message, or multiple messages, and your messages might span multiple Packet instances. I've addressed how to know when you've received a full message in the related question you posted here.

至于您遇到的另一个问题，请务必记住，使用这种方法时，每个 Packet 实例不一定代表您的应用程序上下文中的完整消息。一个 Packet 实例可能包含部分消息、单个消息或多个消息，并且您的消息可能跨越多个 Packet 实例。我已经在您在此处发布的相关问题中讨论了如何知道您何时收到完整消息。

Answer 5

回答by RepDbg

There seems to be a lot of confusion surrounding this. The examples on MSDN's site for async socket communication using TCP are misleading and not well explained. The EndReceive call will indeed block if the message size is an exact multiple of the receive buffer. This will cause you to never get your message and the application to hang.

围绕这一点似乎有很多困惑。MSDN 站点上使用 TCP 进行异步套接字通信的示例具有误导性并且没有得到很好的解释。如果消息大小是接收缓冲区的精确倍数，则 EndReceive 调用确实会阻塞。这将导致您永远不会收到您的消息和应用程序挂起。

Just to clear things up - You MUST provide your own delimiter for data if you are using TCP. Read the following (this is from a VERY reliable source).

只是为了澄清 - 如果您使用 TCP，您必须为数据提供自己的分隔符。阅读以下内容（来自非常可靠的来源）。

The Need For Application Data Delimiting
The other impact of TCP treating incoming data as a stream is that data received by an application using TCP is unstructured. For transmission, a stream of data goes into TCP on one device, and on reception, a stream of data goes back to the application on the receiving device. Even though the stream is broken into segments for transmission by TCP, these segments are TCP-level details that are hidden from the application. So, when a device wants to send multiple pieces of data, TCP provides no mechanism for indicating where the “dividing line” is between the pieces, since TCP doesn't examine the meaning of the data at all. The application must provide a means for doing this.
Consider for example an application that is sending database records. It needs to transmit record #579 from the Employees database table, followed by record #581 and record #611. It sends these records to TCP, which treats them all collectively as a stream of bytes. TCP will package these bytes into segments, but in a manner the application cannot predict. It is possible that each will end up in a different segment, but more likely they will all be in one segment, or part of each will end up in different segments, depending on their length. The records themselves must have some sort of explicit markers so the receiving device can tell where one record ends and the next starts.
Source: http://www.tcpipguide.com/free/t_TCPDataHandlingandProcessingStreamsSegmentsandSequ-3.htm

应用程序数据定界的需要
TCP 将传入数据视为流的另一个影响是使用 TCP 的应用程序接收的数据是非结构化的。对于传输，数据流进入一个设备上的 TCP，在接收时，数据流返回到接收设备上的应用程序。即使流被分成多个段以供 TCP 传输，这些段也是对应用程序隐藏的 TCP 级别的详细信息。因此，当一个设备想要发送多条数据时，TCP 没有提供机制来指示这些数据之间的“分界线”在哪里，因为 TCP 根本不检查数据的含义。应用程序必须提供执行此操作的方法。
例如，考虑一个发送数据库记录的应用程序。它需要从Employees 数据库表中传输记录#579，然后是记录#581 和记录#611。它将这些记录发送到 TCP，TCP 将它们全部视为字节流。TCP 会将这些字节打包成段，但以应用程序无法预测的方式。有可能每个都以不同的段结束，但更有可能的是它们都在一个段中，或者每个的一部分将在不同的段中结束，这取决于它们的长度。记录本身必须有某种明确的标记，以便接收设备可以知道一个记录在哪里结束，下一个记录在哪里开始。
来源：http: //www.tcpipguide.com/free/t_TCPDataHandlingandProcessingStreamsSegmentsandSequ-3.htm

Most examples I see online for using EndReceive are wrong or misleading. It usually causes no problems in the examples because only one predefined message is sent and then the connection is closed.

我在网上看到的大多数使用 EndReceive 的例子都是错误的或具有误导性的。在示例中它通常不会引起问题，因为只发送一个预定义的消息，然后关闭连接。

Answer 6

回答by soyoja

Also I troubled same problem.

我也遇到了同样的问题。

When I tested several times, I found that sometimes multiple BeginReceive - EndReceivemakes packet loss. (This loop was ended improperly)

我测试了几次，发现有时候多次BeginReceive - EndReceive导致丢包。（这个循环被错误地结束了）

In my case, I used two solution.

就我而言，我使用了两种解决方案。

First, I defined the enough packet size to make only 1 time BeginReceive() ~ EndReceive();

首先，我定义了足够的数据包大小只做 1 次 BeginReceive() ~ EndReceive();

Second, When I receive large size of data, I used NetworkStream.Read()instead of BeginReceive() - EndReceive().

其次，当我收到的数据尺寸大，我用NetworkStream.Read()代替BeginReceive() - EndReceive()。

Asynchronous socket is not easy to use, and it need a lot of understanding about socket.

异步socket不好用，需要对socket有很多了解。

Answer 7

回答by Sérgio Sousa

This a very old topic, but I got here looking for something else and found this:

这是一个非常古老的话题，但我来到这里寻找其他东西并发现了这个：

Now this corrected works fine most of the time, but it fails when the packet's size is a multiple of the buffer.The reason for this is if the buffer gets filled on a read it is assumed there is more data; but the same problem happens as before. A 2 byte buffer, for exmaple, gets filled twice on a 4 byte packet, and assumes there is more data. It then blocks because there is nothing left to read. The problem is that the receive function doesn't know when the end of the packet is.

现在这个更正了大部分时间都可以正常工作，但是当数据包的大小是缓冲区的倍数时它会失败。这样做的原因是如果缓冲区在读取时被填满，则假定有更多数据；但是和以前一样发生了同样的问题。例如，一个 2 字节的缓冲区在一个 4 字节的数据包上被填充两次，并假设有更多的数据。然后它会阻塞，因为没有任何东西可以读取。问题是接收函数不知道数据包何时结束。

I had this same problem, and since none of the replies seems to solve this, the way I did it was using Socket.Available

我遇到了同样的问题，由于没有任何回复似乎解决了这个问题，所以我的做法是使用Socket.Available

public static void Read_Callback(IAsyncResult ar)
{
    StateObject so = (StateObject) ar.AsyncState;
    Socket s = so.workSocket;

    int read = s.EndReceive(ar);    
    if (read > 0) 
    {
        so.sb.Append(Encoding.ASCII.GetString(so.buffer, 0, read));

        if (s.Available == 0)
        {
            // All data received, process it as you wish
        }
    }
    // Listen for more data
    s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0, 
                new AyncCallback(Async_Send_Receive.Read_Callback), so);
}

Hope this helps others, SO have helped me many times, thank you all!

希望这对其他人有帮助，所以帮助了我很多次，谢谢大家！

C# Begin/EndReceive - 如何读取大数据？

提问by ryeguy

edit:

编辑：

采纳答案by Jon Skeet

回答by casperOne

回答by Marc Gravell

回答by Matt Davis

回答by RepDbg

回答by soyoja

回答by Sérgio Sousa

相关推荐

最近更新

标签

C# Begin/EndReceive - 如何读取大数据？

提问by ryeguy

edit:

编辑：

采纳答案by Jon Skeet

回答by casperOne

回答by Marc Gravell

回答by Matt Davis

回答by RepDbg

回答by soyoja

回答by Sérgio Sousa

相关推荐

C# 如何获得“友好”的操作系统版本名称？

有没有办法强制 C# 类实现某些静态函数？

C# 在哪里存储配置信息

C# “数据按摩”是什么意思？

相关推荐

最近更新

标签