从 C/C++ 中的 TCP 套接字读取的正确方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/666601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 16:35:39  来源:igfitidea点击:

What is the correct way of reading from a TCP socket in C/C++?

c++ctcp

提问by Nick Bolton

Here's my code:

这是我的代码:

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
    cout << "Receiving chunk... ";

    // Read a bit at a time, eventually "end" string will be received.
    bzero(buffer, BUFFER_SIZE);
    int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    if (readResult < 0)
    {
        THROW_VIMRID_EX("Could not read from socket.");
    }

    // Concatenate the received data to the existing data.
    readStream << buffer;

    // Continue reading while end is not found.
    readData = readStream.str().find("end;") == string::npos;

    cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

It's a little bit of C and C++ as you can tell. The BUFFER_SIZE is 256 - should I just increase the size? If so, what to? Does it matter?

正如你所知道的,它有点 C 和 C++。BUFFER_SIZE 是 256 - 我应该增加大小吗?如果是这样,该怎么办?有关系吗?

I know that if "end" is not received for what ever reason, this will be an endless loop, which is bad - so if you could suggest a better way, please also do so.

我知道如果出于任何原因没有收到“结束”,这将是一个无限循环,这很糟糕 - 所以如果你能提出更好的方法,也请这样做。

回答by grieve

Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

在不了解您的完整应用程序的情况下,很难说出解决问题的最佳方法是什么,但一种常见的技术是使用以固定长度字段开头的标头,该字段表示消息其余部分的长度。

Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

假设您的标头仅由一个 4 字节整数组成,它表示您的消息其余部分的长度。然后只需执行以下操作。

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead);
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

Then later in the code

然后在代码中

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

This makes a few assumptions:

这提出了几个假设:

  • ints are the same size on the sender and receiver.
  • Endianess is the same on both the sender and receiver.
  • You have control of the protocol on both sides
  • When you send a message you can calculate the length up front.
  • 发送方和接收方的整数大小相同。
  • 发送方和接收方的字节顺序相同。
  • 您可以控制双方的协议
  • 当您发送消息时,您可以预先计算长度。

Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

因为想要明确知道您通过网络发送的整数的大小是很常见的,所以在头文件中定义它们并明确使用它们,例如:

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

This would change the above to:

这会将上述内容更改为:

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

I hope this helps.

我希望这有帮助。

回答by Ori Pessach

Several pointers:

几个指点:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

您需要处理返回值 0,它告诉您远程主机关闭了套接字。

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

对于非阻塞套接字,您还需要检查错误返回值 (-1) 并确保 errno 不是 EINPROGRESS,这是预期的。

You definitely need better error handling - you're potentially leaking the buffer pointed to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

您肯定需要更好的错误处理 - 您可能会泄漏“缓冲区”指向的缓冲区。我注意到,您没有在此代码片段中的任何地方分配。

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

如果您的 read() 填满了整个缓冲区,那么其他人很好地说明了您的缓冲区如何不是以空字符结尾的 C 字符串。这确实是一个问题,而且是一个严重的问题。

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

您的缓冲区大小有点小,但只要您不尝试读取超过 256 个字节或您为其分配的任何字节,就应该可以工作。

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

如果您担心在远程主机向您发送格式错误的消息(潜在的拒绝服务攻击)时进入无限循环,那么您应该使用 select() 并在套接字上超时以检查可读性,并且仅在以下情况下读取数据可用,并在 select() 超时时退出。

Something like this might work for you:

像这样的事情可能对你有用:

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

根据您期望接收的数据量,您重复扫描整个消息以寻找“结尾”的方式;令牌非常低效。这最好使用状态机(状态为 'e'->'n'->'d'->';'),以便您只查看每个传入字符一次。

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.

说真的,你应该考虑找一个图书馆来为你做这一切。做对了并不容易。

回答by Dan Breslau

1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:

1)其他人(尤其是直接地)注意到缓冲区需要分配一些内存空间。对于较小的 N 值(例如,N <= 4096),您还可以在堆栈上分配它:

#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]

This saves you the worry of ensuring that you delete[]the buffer should an exception be thrown.

这让您不必担心确保您delete[]的缓冲区应该抛出异常。

But remember that stacks arefinite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

但请记住,堆栈大小有限的(堆也是,但堆栈是有限的),因此您不想在那里放太多。

2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).

2) 在 -1 返回码上,您不应该简单地立即返回(立即抛出异常更加粗略。)如果您的代码不仅仅是一个简短的家庭作业,则您需要处理某些正常情况. 例如,如果当前在非阻塞套接字上没有可用数据,则可能会在 errno 中返回 EAGAIN。查看 read(2) 的手册页。

回答by Dan Breslau

If you actually create the buffer as per dirks suggestion, then:

如果您实际上按照 dirks 建议创建缓冲区,则:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);

may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:

可能会完全填满缓冲区,可能会覆盖提取到字符串流时所依赖的终止零字符。你需要:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );

回答by Arnold Spence

This is an article that I always refer to when working with sockets..

这是我在使用套接字时总是参考的一篇文章。

THE WORLD OF SELECT()

选择世界()

It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.

它将向您展示如何可靠地使用“select()”,并在底部包含一些其他有用的链接,以获取有关套接字的更多信息。

回答by dirkgently

Where are you allocating memory for your buffer? The line where you invoke bzeroinvokes undefined behavior since buffer does not point to any valid region of memory.

你在哪里为你的内存分配内存buffer?您调用的行会bzero调用未定义的行为,因为缓冲区未指向任何有效的内存区域。

char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;

回答by Joseph Larson

Just to add to things from several of the posts above:

只是为了添加上面几篇文章中的内容:

read() -- at least on my system -- returns ssize_t. This is like size_t, except is signed. On my system, it's a long, not an int. You might get compiler warnings if you use int, depending on your system, your compiler, and what warnings you have turned on.

read()——至少在我的系统上——返回 ssize_t。这与 size_t 类似,只是有符号。在我的系统上,它是一个 long,而不是一个 int。如果使用 int,您可能会收到编译器警告,具体取决于您的系统、编译器以及您打开的警告。

回答by Marcus Harrison

For any non-trivial application (I.E. the application must receive and handle different kinds of messages with different lengths), the solution to your particular problem isn't necessarily just a programming solution - it's a convention, I.E. a protocol.

对于任何非平凡的应用程序(即应用程序必须接收和处理具有不同长度的不同类型的消息),您特定问题的解决方案不一定只是一个编程解决方案——它是一种约定,IE 是一种协议。

In order to determine how many bytes you should pass to your readcall, you should establish a common prefix, or header, that your application receives. That way, when a socket first has reads available, you can make decisions about what to expect.

为了确定应该传递给read调用的字节数,您应该建立应用程序接收的公共前缀或标头。这样,当套接字第一次有可用读取时,您可以决定期望什么。

A binary example might look like this:

二进制示例可能如下所示:


enum MessageType {
    MESSAGE_FOO,
    MESSAGE_BAR,
};

struct MessageHeader {
    enum MessageType type;
    uint32_t length;
};

/**
 * Attempts to continue reading a `socket` until `bytes` number
 * of bytes are read. Returns truthy on success, falsy on failure.
 * 
 * Similar to @Duncan_Jones's ReadXBytes.
 */
int readExpected(int socket, void *destination, size_t bytes)
{
    while (bytes) {
        size_t readBytes = read(socket, destination, bytes);
        if (readBytes < 1)
            return 0;
        bytes -= readBytes;
    }
    return 1;
}

int main(int argc, char **argv)
{
    // use `select` or `poll` to wait on sockets
    // received a message on `selectedFd`, start reading

    struct MessageHeader received;
    if (!readExpected (selectedFd, &received, sizeof(received))) {
        // handle error
    }
    // handle network/host byte order differences maybe
    received.type = htonl(received.type);
    received.length = htonl(received.length);

    switch (received.type) {
        case MESSAGE_FOO:
            // "foo" sends an ASCII string or something
            char *fooMessage = calloc(received.length + 1, 1);
            if (readExpected (selectedFd, fooMessage, received.length))
                puts(fooMessage);
            free(fooMessage);
            break;
        case MESSAGE_BAR:
            // "bar" sends a message of a fixed size
            struct {
                int a,
                int b,
            } barMessage;
            if (readExpected (selectedFd, &barMessage, sizeof(barMessage))) {
                barMessage.a = htonl(barMessage.a);
                barMessage.b = htonl(barMessage.b);
                printf("a + b = %d\n", barMessage.a + barMessage.b);
            }
            break;
        default:
            puts("Malformed type received");
            // kick the client out probably
    }
}

You can likely already see one disadvantage of using a binary format - for each attribute greater than a charyou read, you will have to ensure its byte order is correct using the ntohlor ntohsfunctions.

您可能已经看到使用二进制格式的一个缺点 - 对于大于char您读取的每个属性,您必须使用ntohlorntohs函数确保其字节顺序是正确的。

An alternative is to use byte-encoded messages, such as simple ASCII or UTF-8 strings, which avoid byte-order issues entirely but require extra effort to parse and validate.

另一种方法是使用字节编码的消息,例如简单的 ASCII 或 UTF-8 字符串,它们完全避免了字节顺序问题,但需要额外的努力来解析和验证。