如何使用 Python 和 Google 的协议缓冲区反序列化通过 TCP 发送的数据

Question

提问by Hyman Edmonds

I'm trying to write an application which uses Google's protocol buffers to deserialize data (sent from another application using protocol buffers) over a TCP connection. The problem is that it looks as if protocol buffers in Python can only deserialize data from a string. Since TCP doesn't have well-defined message boundaries and one of the messages I'm trying to receive has a repeated field, I won't know how much data to try and receive before finally passing the string to be deserialized.

我正在尝试编写一个应用程序，该应用程序使用 Google 的协议缓冲区通过 TCP 连接反序列化数据（使用协议缓冲区从另一个应用程序发送）。问题是看起来 Python 中的协议缓冲区似乎只能从字符串中反序列化数据。由于 TCP 没有明确定义的消息边界，并且我尝试接收的消息之一具有重复字段，因此在最终传递要反序列化的字符串之前，我不知道要尝试和接收多少数据。

Are there any good practices for doing this in Python?

在 Python 中这样做有什么好的做法吗？

Answer 1

回答by J.J.

Don't just write the serialized data to the socket. First send a fixed-size field containing the length of the serialized object.

不要只是将序列化的数据写入套接字。首先发送一个包含序列化对象长度的固定大小字段。

The sending side is roughly:

发送方大致是：

socket.write(struct.pack("H", len(data))    #send a two-byte size field
socket.write(data)

And the recv'ing side becomes something like:

接收方变成了这样：

dataToRead = struct.unpack("H", socket.read(2))[0]    
data = socket.read(dataToRead)

This is a common design pattern for socket programming. Most designs extend the over-the-wire structure to include a type field as well, so your receiving side becomes something like:

这是套接字编程的常见设计模式。大多数设计都扩展了在线结构以包括类型字段，因此您的接收端变成了这样：

type = socket.read(1)                                 # get the type of msg
dataToRead = struct.unpack("H", socket.read(2))[0]    # get the len of the msg
data = socket.read(dataToRead)                        # read the msg

if TYPE_FOO == type:
    handleFoo(data)

elif TYPE_BAR == type:
    handleBar(data)

else:
    raise UnknownTypeException(type)

You end up with an over-the-wire message format that looks like:

您最终会得到如下所示的在线消息格式：

struct {
     unsigned char type;
     unsigned short length;
     void *data;
}

This does a reasonable job of future-proofing the wire protocol against unforeseen requirements. It's a Type-Length-Valueprotocol, which you'll find again and again and again in network protocols.

这在针对不可预见的要求对线路协议进行未来验证方面做了合理的工作。它是一种类型-长度-值协议，您会在网络协议中一次又一次地找到它。

Answer 2

回答by frymaster

to expand on J.J.'s (entirely correct) answer, the protobuf library has no wayto work out how long messages are on their own, or to work out what type of protobuf object is being sent*. So the other application that's sending you data must already be doing something like this.

为了扩展 JJ 的（完全正确的）答案，protobuf 库无法自行计算出消息的长度，也无法计算出正在发送的 protobuf 对象的类型*。因此，向您发送数据的其他应用程序一定已经在做类似的事情。

When I had to do this, I implemented a lookup table:

当我不得不这样做时，我实现了一个查找表：

messageLookup={0:foobar_pb2.MessageFoo,1:foobar_pb2.MessageBar,2:foobar_pb2.MessageBaz}

...and did essentially what J.J. did, but I also had a helper function:

...基本上做了 JJ 所做的，但我也有一个辅助函数：

    def parseMessage(self,msgType,stringMessage):
        msgClass=messageLookup[msgType]
        message=msgClass()
        message.ParseFromString(stringMessage)
        return message

...which I called to turn the string into a protobuf object.

...我调用它以将字符串转换为 protobuf 对象。

(*) I think it's possible to get round this by encapsulating specific messages inside a container message

(*) 我认为可以通过将特定消息封装在容器消息中来解决这个问题

Answer 3

回答by davidA

Another aspect to consider (albeit for a simpler case) is where you use a single TCP connection for a single message. In this case, as long as you know what the expected message is (or use Union Typesto determine the message type at run-time), you can use the TCP connection open as the 'start' delimiter, and the connection close event as the final delimiter. This has the advantage that you'll receive the entire message quickly (whereas in other cases the TCP stream can be held for a time, delaying the receipt of your entire message). If you do this, you don't need any explicit in-band framing as the lifetime of the TCP connection acts as a frame itself.

要考虑的另一个方面（尽管是更简单的情况）是您对单个消息使用单个 TCP 连接的情况。在这种情况下，只要知道预期的消息是什么（或在运行时使用联合类型来确定消息类型），就可以使用 TCP 连接打开作为“开始”分隔符，连接关闭事件作为最后的分隔符。这样做的好处是您可以快速接收整条消息（而在其他情况下，TCP 流可以保留一段时间，从而延迟接收整条消息）。如果这样做，则不需要任何显式的带内帧，因为 TCP 连接的生命周期本身就是一个帧。

如何使用 Python 和 Google 的协议缓冲区反序列化通过 TCP 发送的数据

提问by Hyman Edmonds

回答by J.J.

回答by frymaster

回答by davidA

相关推荐

最近更新

标签

如何使用 Python 和 Google 的协议缓冲区反序列化通过 TCP 发送的数据

提问by Hyman Edmonds

回答by J.J.

回答by frymaster

回答by davidA

相关推荐

使用 Python 通过 Bloomberg 的新数据 API (COM v3) 进行异步数据？

python python等价于'#define func()'或如何在python中注释掉函数调用

执行多个请求时如何加速 Python 的 urllib2

python 加速python“导入”加载器

相关推荐

最近更新

标签