使用 C++/STL 存储二进制数据的“正确”方式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/441203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 15:22:17  来源:igfitidea点击:

"Proper" way to store binary data with C++/STL

c++stlbinary-data

提问by Sean Edwards

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

一般来说,在 C++ 中存储二进制数据的最佳方式是什么?据我所知,这些选项几乎可以归结为使用字符串或 vector<char> 。(我将省略 char*s 和 malloc()s 的可能性,因为我专门指的是 C++)。

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?

通常我只使用一个字符串,但是我不确定是否有我遗漏的开销,或者 STL 在内部进行的转换可能会干扰二进制数据的完整性。有没有人对此有任何指示(har)?以一种或另一种方式提出建议或偏好?

采纳答案by Doug T.

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

char 的向量很好,因为内存是连续的。因此,您可以将它与许多 C API 一起使用,例如 Berkley 套接字或文件 API。例如,您可以执行以下操作:

  std::vector<char> vect;
  ...
  send(sock, &vect[0], vect.size());

and it will work fine.

它会正常工作。

You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

您基本上可以像对待任何其他动态分配的字符缓冲区一样对待它。您可以上下扫描以寻找幻数或模式。您可以就地部分解析它。对于从套接字接收,您可以非常轻松地调整其大小以附加更多数据。

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.

缺点是调整大小不是非常有效(谨慎地调整大小或预分配)并且从数组的前面删除也将非常低效。如果您需要非常频繁地一次从数据结构的前面弹出一两个字符,则在此处理之前复制到双端队列可能是一种选择。这会花费你一个复制和双端队列内存不是连续的,所以你不能只传递一个指向 C API 的指针。

Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.

最重要的是,在深入研究之前了解数据结构及其权衡,但是 char 向量通常是我在一般实践中看到的。

回答by Doug T.

The biggest problem with std::string is that the current standard doesn't guarantee that its underlying storage is contiguous. However, there are no known STL implementations where string is not contiguous, so in practice it probably won't fail. In fact, the new C++0x standard is going to fix this problem, by mandating that std::string uses a contiguous buffer, such as std::vector.

std::string 的最大问题是当前标准不保证其底层存储是连续的。但是,没有已知的字符串不连续的 STL 实现,因此在实践中它可能不会失败。事实上,新的 C++0x 标准将通过强制 std::string 使用连续缓冲区来解决这个问题,例如 std::vector。

Another argument against string is that its name suggests that it contains a character string, not a binary buffer, which may cause confusion to those who read the code.

另一个反对 string 的论点是它的名字暗示它包含一个字符串,而不是一个二进制缓冲区,这可能会使阅读代码的人感到困惑。

That said, I recommend vector as well.

也就是说,我也推荐矢量。

回答by Head Geek

I use std::stringfor this too, and have never had a problem with it.

我也用过std::string这个,从来没有遇到过问题。

One "pointer," which I just received a sharp reminder of in a piece of code yesterday: when creating a string from a block of binary data, use the std::string(startIter, endIter)constructor form, not the std::string(ptr, offset, length)form -- the latter makes the assumption that the pointer points to a C-style string, and ignores anything after the first zero character (it copies "up to" the specified length, not lengthcharacters).

一个“指针”,我昨天刚刚在一段代码中收到了一个尖锐的提醒:从二进制数据块创建字符串时,使用std::string(startIter, endIter)构造函数形式,而不是std::string(ptr, offset, length)形式——后者假设指针指向到 C 样式字符串,并忽略第一个零字符之后的任何内容(它复制“最多”指定的length,而不是length字符)。

回答by Todd Gardner

You should certainly be using some container of char, but the container you want to use depends on your application.

您当然应该使用一些字符容器,但是您要使用的容器取决于您的应用程序。

Chars have several properties that make them useful for holding binary data: the standard disallows any "padding" for a char datatype, which is important since it means that you won't get garbage in your the binary layout. Each char is also guaranteed to be exactly one byte, making it the only plain old datatype (POD) with set width (all others are specified in terms of upper and/or lower bounds).

Chars 有几个属性使它们可用于保存二进制数据:标准不允许对 char 数据类型进行任何“填充”,这很重要,因为这意味着您不会在二进制布局中得到垃圾。每个字符也保证恰好是一个字节,使其成为唯一具有设置宽度的普通旧数据类型(POD)(所有其他字符均根据上限和/或下限指定)。

The discussion on appropriate stl container with which to store the chars is handled by well by Doug above. Which one you need depends entirely on your use case. If you are just holding a block of data you iterate through, without any special lookup, append/remove, or splice needs, I would prefer vector, which makes your intentions more clear than std::string, which many libraries and functions will assume holds a null-terminated c-style string.

关于存储字符的适当 stl 容器的讨论由上面的 Doug 处理得很好。您需要哪一个完全取决于您的用例。如果你只是持有一个你迭代的数据块,没有任何特殊的查找、追加/删除或拼接需求,我更喜欢向量,它比 std::string 使你的意图更清晰,许多库和函数都会假设保存以空字符结尾的 c 样式字符串。