C++ 字节缓冲区应该是有符号的还是无符号的字符缓冲区?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/653336/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should a buffer of bytes be signed or unsigned char buffer?
提问by Hymanhab
Should a buffer of bytes be signed char or unsigned char or simply a char buffer? Any differences between C and C++?
字节缓冲区应该是有符号字符还是无符号字符,还是只是一个字符缓冲区?C 和 C++ 之间有什么区别吗?
Thanks.
谢谢。
采纳答案by dan04
Should a buffer of bytes be signed char or unsigned char or simply a char buffer? Any differences between C and C++?
字节缓冲区应该是有符号字符还是无符号字符,或者只是一个字符缓冲区?C 和 C++ 之间有什么区别吗?
A minor difference in how the language treats it. A hugedifference in how convention treats it.
语言处理它的方式略有不同。一个巨大的约定如何处理它的差异。
char
= ASCII (or UTF-8, but the signedness gets in the way there) textualdataunsigned char
= bytesigned char
= rarely used
char
= ASCII(或 UTF-8,但签名妨碍了那里)文本数据unsigned char
= 字节signed char
= 很少使用
And there is code that relieson such a distinction. Just a week or two ago I encountered a bug where JPEG data was getting corrupted because it was being passed to the char*
version of our Base64 encode function — which "helpfully" replaced all the invalid UTF-8 in the "string". Changing to BYTE
aka unsigned char
was all it took to fix it.
并且存在依赖于这种区别的代码。就在一两周前,我遇到了一个错误,其中 JPEG 数据被损坏,因为它被传递到char*
我们的 Base64 编码函数的版本——它“帮助”替换了“字符串”中所有无效的 UTF-8。更改为BYTE
akaunsigned char
即可修复它。
回答by Johannes Schaub - litb
If you intend to store arbitrary binary data, you should use unsigned char
. It is the only data type that is guaranteed to have no padding bits by the C Standard. Each other data type may contain padding bits in its object representation (that is the one that contains all bits of an object, instead of only those that determines a value). The padding bits' state is unspecified and are not used to store values. So if you read using char
some binary data, things would be cut down to the value range of a char (by interpreting only the value bits), but there may still be bits that are just ignored but still are there and read by memcpy
. Much like padding bits in real struct objects. Type unsigned char
is guaranteed to not contain those. That follows from 5.2.4.2.1/2
(C99 TC2, n1124 here):
如果您打算存储任意二进制数据,您应该使用unsigned char
. 它是 C 标准保证没有填充位的唯一数据类型。每个其他数据类型可能在其对象表示中包含填充位(即包含对象的所有位,而不是仅那些确定值的位)。填充位的状态未指定,不用于存储值。因此,如果您使用char
一些二进制数据进行读取,则事情将被缩减到 char 的值范围(通过仅解释值位),但可能仍有一些位被忽略但仍然存在并由memcpy
. 很像在真正的结构对象中填充位。类型unsigned char
保证不包含那些。这来自5.2.4.2.1/2
(C99 TC2,此处为 n1124):
If the value of an object of type char is treated as a signed integer when used in an expression, the value of
CHAR_MIN
shall be the same as that ofSCHAR_MIN
and the value ofCHAR_MAX
shall be the same as that ofSCHAR_MAX
. Otherwise, the value ofCHAR_MIN
shall be 0 and the value ofCHAR_MAX
shall be the same as that ofUCHAR_MAX
. The valueUCHAR_MAX
shall equal2^CHAR_BIT ? 1
如果在表达式中使用 char 类型对象的值作为有符号整数处理,则 的值
CHAR_MIN
应SCHAR_MIN
与 的值相同, 的值CHAR_MAX
应与 的值相同SCHAR_MAX
。否则, 的值CHAR_MIN
应为 0, 的值CHAR_MAX
应与 的值相同UCHAR_MAX
。该值UCHAR_MAX
应等于2^CHAR_BIT ? 1
From the last sentence it follows that there is no space left for any padding bits. If you use char
as the type of your buffer, you also have the problem of overflows: Assigning any value explicitly to one such element which is in the range of 8
bits - so you may expect such assignment to be OK - but not within the range of a char
, which is CHAR_MIN
..CHAR_MAX
, such a conversion overflows and causes implementation defined results, including raise of signals.
从最后一句可以看出,没有为任何填充位留下空间。如果您char
用作缓冲区的类型,您也会遇到溢出问题:将任何值显式分配给8
位范围内的一个这样的元素- 因此您可能期望这种分配是可以的 - 但不在范围内a char
,即CHAR_MIN
.. CHAR_MAX
,这样的转换溢出并导致实现定义的结果,包括信号的产生。
Even if any problems regarding the above would probably not show in real implementations (would be a verypoor quality of implementation), you are best to use the right type from the beginning onwards, which is unsigned char
.
即使与上述有关的任何问题可能不会在实际实现中出现(实现质量非常差),您最好从一开始就使用正确的类型,即unsigned char
.
For strings, however, the data type of choice is char
, which will be understood by string and print functions. Using signed char
for these purposes looks like a wrong decision to me.
然而,对于字符串,选择的数据类型是char
,字符串和打印函数可以理解它。使用signed char
这些目的看起来像一个错误的决定对我来说。
For further information, read this proposal
which contain a fix for a next version of the C Standard which eventually will require signed char
not have any padding bits either. It's already incorporated into the working paper.
有关更多信息,请阅读this proposal
其中包含对 C 标准的下一版本的修复,该版本最终signed char
也不需要任何填充位。它已经被纳入工作文件。
回答by RBerteig
It depends.
这取决于。
If the buffer is intended to hold text, then it probably makes sense to declare it as an array of char
and let the platform decide for you whether that is signed or unsigned by default. That will give you the least trouble passing the data in and out of the implementation's runtime library, for example.
如果缓冲区旨在保存文本,那么将其声明为数组char
并让平台为您决定默认情况下是有符号还是无符号可能是有意义的。例如,这将使您在将数据传入和传出实现的运行时库时遇到最少的麻烦。
If the buffer is intended to hold binary data, then it depends on how you intend to use it. For example, if the binary data is really a packed array of data samples that are signed 8-bit fixed point ADC measurements, then signed char
would be best.
如果缓冲区旨在保存二进制数据,则取决于您打算如何使用它。例如,如果二进制数据真的是一个打包的数据样本数组,这些数据样本是有符号的 8 位定点 ADC 测量,那么signed char
最好。
In most real-world cases, the buffer is just that, a buffer, and you don't really care about the types of the individual bytes because you filled the buffer in a bulk operation, and you are about to pass it off to a parser to interpret the complex data structure and do something useful. In that case, declare it in the simplest way.
在大多数实际情况下,缓冲区就是一个缓冲区,您并不真正关心单个字节的类型,因为您在批量操作中填充了缓冲区,然后将其传递给解析器来解释复杂的数据结构并做一些有用的事情。在这种情况下,以最简单的方式声明它。
回答by Pete Kirkham
If it actually is a buffer of 8 bit bytes, rather than a string in the machine's default locale, then I'd use uint8_t
. Not that there are many machines around where a char is not a byte (or a byte a octet), but making the statement 'this is a buffer of octets' rather than 'this is a string' is often useful documentation.
如果它实际上是一个 8 位字节的缓冲区,而不是机器默认语言环境中的字符串,那么我会使用uint8_t
. 并不是说有很多机器周围的字符不是字节(或字节是八位字节),但是声明“这是八位字节的缓冲区”而不是“这是字符串”通常是有用的文档。
回答by Richard Corden
You should use either charor unsigned charbut never signed char. The standard has the following in 3.9/2
您应该使用char或unsigned char ,但不要使用signed char。标准在3.9/2中有以下内容
For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
对于 POD 类型 T 的任何对象(除基类子对象外),无论该对象是否持有 T 类型的有效值,构成该对象的底层字节 (1.7) 都可以复制到 char 或 unsigned 数组中char。如果char 或unsigned char 数组的内容被复制回对象中,则该对象随后应保持其原始值。
回答by Naveen
It is better to define it as unsigned char. Infact Win32 type BYTE is defined as unsigned char. There is no difference between C & C++ between this.
最好将其定义为无符号字符。事实上,Win32 类型 BYTE 被定义为无符号字符。C & C++ 之间没有区别。
回答by MrEvil
For maximum portability always use unsigned char. There are a couple of instances where this could come into play. Serialized data shared across systems with different endian type immediately comes to mind. When performing shift or bit masking the values is another.
为了获得最大的可移植性,请始终使用 unsigned char。有几个例子可以发挥作用。立即想到在具有不同字节序类型的系统之间共享的序列化数据。当执行移位或位掩码时,值是另一个。
回答by Trevor Boyd Smith
The choice of int8_t vs uint8_t is similar to when you are comparing a ptr to be NULL.
int8_t 与 uint8_t 的选择类似于将 ptr 与 NULL 进行比较时的选择。
From a functionality point of view, comparing to NULL is the same as comparing to 0 because NULL is a #define for 0.
从功能的角度来看,与 NULL 比较与与 0 比较相同,因为 NULL 是 0 的 #define。
But personally, from a coding style point of view, I choose to compare my pointers to NULL because the NULL #define connotes to the person maintaining the code that you are checking for a bad pointer...
但就我个人而言,从编码风格的角度来看,我选择将我的指针与 NULL 进行比较,因为 NULL #define 意味着维护代码的人正在检查错误的指针......
VS
VS
when someone sees a comparison to 0 it connotes that you are checking for a specific value.
当有人看到与 0 的比较时,表示您正在检查特定值。
For the above reason, I would use uint8_t.
由于上述原因,我会使用 uint8_t。
回答by Gorpik
Do you really care? If you don't, just use the default (char) and don't clutter your code with unimportant matter. Otherwise, future maintainers will be left wondering why did you use signed (or unsigned). Make their life simpler.
你真的在乎吗?如果不这样做,只需使用默认值 (char),不要用无关紧要的事情弄乱您的代码。否则,未来的维护者会想知道你为什么使用签名(或未签名)。让他们的生活更简单。
回答by schnaader
Several years ago I had a problem with a C++ console application that printed colored chars for ASCII values above 128 and this was solved by switching from char to unsigned char, but I think it had been solveable while keeping char type, too.
几年前,我遇到了一个 C++ 控制台应用程序的问题,该应用程序为 128 以上的 ASCII 值打印彩色字符,这是通过从 char 切换到 unsigned char 解决的,但我认为在保持 char 类型的同时也可以解决。
For now, most C/C++ functions use char and I understand both languages much better now, so I use char in most cases.
目前,大多数 C/C++ 函数都使用 char,我现在对这两种语言的理解都更好了,所以我在大多数情况下都使用 char。