C++ 我可以将无符号字符转换为字符,反之亦然吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15078638/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 18:59:59  来源:igfitidea点击:

Can I turn unsigned char into char and vice versa?

c++c

提问by user2015453

I want to use a function that expects data like this:

我想使用一个需要这样数据的函数:

void process(char *data_in, int data_len);

So it's just processing some bytes really.

所以它实际上只是在处理一些字节。

But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:

但是,当涉及到原始字节时,我更愿意使用“无符号字符”(它以某种方式“感觉”仅处理正 0 到 255 值更合适),所以我的问题是:

Can I always safely pass a unsigned char *into this function?

我总是可以安全地将 a 传递给unsigned char *这个函数吗?

In other words:

换句话说:

  • Is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?
  • Can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?
  • 是否保证我可以随意在 char 和 unsigned char 之间安全地转换(转换),而不会丢失任何信息?
  • 我可以随意在指向 char 和 unsigned char 的指针之间安全地转换(强制转换),而不会丢失任何信息吗?

Bonus: Is the answer same in C and C++?

奖励:C 和 C++ 中的答案是否相同?

回答by jogojapan

The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:

如果您使用显式转换,简短的回答是肯定的,但要详细解释它,需要看三个方面:

1) Legality of the conversion
Converting between signed T*and unsigned T*(for some type T) in either direction is generally possible because the source type can first be converted to void *(this is a standard conversion, §4.10), and the void *can be converted to the destination type using an explicit static_cast(§5.2.9/13):

1) 转换的合法性
signed T*unsigned T*(对于某些类型T)之间在任一方向之间转换通常是可能的,因为源类型可以首先转换为void *(这是一个标准转换,第 4.10 节),并且void *可以使用一个明确的static_cast(第 5.2.9/13 节):

static_cast<unsigned char*>(static_cast<void *>(data_in))

This can be abbreviated (§5.2.10/7) as

这可以缩写(第 5.2.10/7 节)为

reinterpret_cast<unsigned char *>(data_in)

because charis a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:

因为char是标准布局类型(第 3.9.1/7,8 节和第 3.9/9 节)并且符号性不会改变对齐方式(第 3.9.1/1 节)。它也可以写成 C 风格的类型转换:

(unsigned char *)(data_in)

Again, this works both ways, from unsigned*to signed*and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).

同样,这可以双向工作,从unsigned*signed*和返回。还有一个保证,如果您以一种方式应用此过程然后返回,则指针值(即它指向的地址)不会改变(第 5.2.10/7 节)。

All of this applies not only to conversions between signed char *and unsigned char *, but also to char */unsigned char *and char */signed char *, respectively. (char, signed charand unsigned charare formally three distinct types, §3.9.1/1.)

所有这些不仅适用于signed char *和之间的转换unsigned char *,还分别适用于char */unsigned char *char */ signed char *。( char,signed char并且unsigned char在形式上是三种不同的类型,第 3.9.1/1 节。)

To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).

需要明确的是,您使用三种强制转换方法中的哪一种并不重要,但您必须使用一种。仅仅传递指针是行不通的,因为转换虽然合法,但不是标准转换,因此不会隐式执行(如果您尝试,编译器将发出错误)。

2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_into retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):

2) 访问值的定义明确
如果在函数内部取消引用指针,即执行*data_in检索底层字符的泛左,会发生什么情况;这是定义明确且合法的吗?这里的相关规则是严格别名规则(第 3.10/10 节):

If a program attempts to access the stored value of an object through a glvalueof other than one of the following types the behavior is undefined:

  • [...]
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • [...]
  • a charor unsigned chartype.

如果程序尝试通过以下类型之一以外的泛左值访问对象的存储值,则行为未定义:

  • [...]
  • 一种类型,它是与对象的动态类型对应的有符号或无符号类型,
  • [...]
  • 一个charunsigned char类型。

Therefore, accessing a signed char(or char) through an unsigned char*(or char) and vice versa is not disallowed by this rule – you should be able to do this without problems.

因此,访问signed char(或char)通过unsigned char*(或char),反之亦然不受此规则不允许的-你应该能够做到这一点没有问题。

3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?

3) 结果值
在取消引用类型转换后的指针后,您是否能够处理获得的值?重要的是要记住,上述指针的转换和取消引用相当于重新解释(而不是更改!)存储在字符地址处的位模式。那么当有符号字符的位模式被解释为无符号字符的位模式时会发生什么(反之亦然)?

When going from unsigned to signed, the typical effectwill be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.

当从无符号变为有符号时,典型的效果是对于 0 到 128 之间的值没有任何反应,而大于 128 的值变为负数。反过来类似:当从有符号变为无符号时,负值将显示为大于 128 的值。

But this behaviour isn't actually guaranteedby the Standard. The only thing the Standard guarantees is that for all three types, char, unsigned charand signed char, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).

但是这种行为实际上并没有得到标准的保证。标准唯一保证的是,对于所有三种类型,char,unsigned charsigned char,所有位(不一定是 8,顺便说一句)都用于值表示。因此,如果您将一个解释为另一个,制作一些副本,然后将其存储回原始位置,您可以确保不会丢失任何信息(根据您的要求),但您不一定知道这些值是什么实际上是指(至少不是以完全可移植的方式)。

回答by Mitch Wheat

unsigned charor signed charis just interpretation: there is no conversion happening.

unsigned char或者signed char只是解释:没有发生转换。

Since you are processing bytes, to show intent, it would be better to declare as

由于您正在处理字节,为了显示意图,最好声明为

void process(unsigned char *data_in, int data_len);

[As noted by an editor: A plain charmay be either a signed or an unsigned type. The C and C++ standards explicitly allow either (it is always a separate type from either unsigned charor signed char, but has the same range as one of them)]

[正如一位编辑所指出的:普通类型char可以是有符号或无符号类型。C 和 C++ 标准明确允许其中之一(它总是与unsigned char或分开的类型signed char,但与其中之一具有相同的范围)]

回答by sissi_luaty

Yes, you can always convert from char to unsigned char & vice versawithout problems. If you run the following code, and compare it with an ASCII table (ref. http://www.asciitable.com/), you can see a proof by yourself, and how the C/C++ deal with the conversions - they deal exactly in the same way:

是的,您始终可以毫无问题地将字符转换为无符号字符,反之亦然。如果您运行以下代码,并将其与 ASCII 表(参考http://www.asciitable.com/)进行比较,您可以自己看到一个证明,以及 C/C++ 如何处理转换 - 他们处理完全一样:

#include "stdio.h"


int main(void) {
    //converting from char to unsigned char
    char c = 0;
    printf("%d byte(s)\n", sizeof(char));  // result: 1byte, i.e. 8bits, so there are 2^8=256 values that a char can store.
    for (int i=0; i<256; i++){
        printf("int value: %d - from: %c\tto: %c\n", c,  c, (unsigned char) c);
        c++;
    }

    //converting from unsigned char to char
    unsigned char uc = 0;
    printf("\n%d byte(s)\n", sizeof(unsigned char));
    for (int i=0; i<256; i++){
        printf("int value: %d - from: %c\tto: %c\n", uc, uc, (char) uc);
        uc++;
    }
}

I will not post the output because it has too many lines! It can be noticed in the output that in the first half of each section, i.e. from i=0:127, the conversion from chars to unsigned chars and vice-versaworks well, without any modification or loss.

我不会发布输出,因为它有太多行!在输出中可以注意到,在每个部分的前半部分,即从 i=0:127 开始,从字符到无符号字符的转换效果很好,反之亦然,没有任何修改或丢失。

However, from i=128:255 the chars and the unsigned chars cannot be casted, or you would have different outputs, because unsigned char saves the values from [0:256] and char saves the values in the interval [-128:127]). Nevertheless, the behaviour in this 2nd half is irrelevant, because in C/C++, in general, you only lead with chars/unsigned chars as ASCII characters, whose can take only 128 different values and the other 128 values (positive for chars or negative for unsigned chars) are never used.

但是,从 i=128:255 开始,chars 和 unsigned chars 不能被强制转换,否则你会有不同的输出,因为 unsigned char 保存了 [0:256] 中的值,char 保存了区间 [-128:127] 中的值])。尽管如此,这第二部分的行为是无关紧要的,因为在 C/C++ 中,一般来说,你只能使用字符/无符号字符作为 ASCII 字符,它们只能取 128 个不同的值和其他 128 个值(字符为正或负对于无符号字符)从不使用。

If you never put a value in a char that doesn't represent a character, and you never put a value in an unsigned char that doesn't represent a character, everything will be OK!

如果您从不将值放入不代表字符的字符中,也从不将值放入不代表字符的无符号字符中,那么一切都会好起来的!

extra: even if you use UTF-8 or other encodings (for special characters) in your strings with C/C++, everything with this kind of casts would be OK, for instance, using UTF-8 encoding (ref. http://lwp.interglacial.com/appf_01.htm):

额外:即使您在 C/C++ 的字符串中使用 UTF-8 或其他编码(用于特殊字符),使用这种类型转换的所有内容都可以,例如,使用 UTF-8 编码(参考。http:// lwp.interglacial.com/appf_01.htm):

char hearts[]   = {0xe2, 0x99, 0xa5, 0x00};
char diamonds[] = {0xe2, 0x99, 0xa6, 0x00};
char clubs[]    = {0xe2, 0x99, 0xa3, 0x00};
char spades[]   = {0xe2, 0x99, 0xa0, 0x00};
printf("hearts (%s)\ndiamonds (%s)\nclubs (%s)\nspades (%s)\n\n", hearts, diamonds, clubs, spades);

the output of that code will be:
hearts (?)
diamonds (?)
clubs (?)
spades (?)

该代码的输出将是:
红心(?)
钻石(?)
俱乐部(?)
黑桃(?)

even if you cast each of its chars to unsigned chars.

即使您将其每个字符转换为无符号字符。

so:

所以:

  • "can I always safely pass a unsigned char * into this function?" yes!

  • "is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?" yes!

  • "can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?" yes!

  • "is the answer same in C and C++?" yes!

  • “我可以总是安全地将一个无符号字符 * 传递给这个函数吗?” 是的!

  • “是否保证我可以安全地随意在 char 和 unsigned char 之间转换(转换),而不会丢失任何信息?” 是的!

  • “我可以随意在指向 char 和 unsigned char 的指针之间安全地转换(强制转换),而不会丢失任何信息吗?” 是的!

  • “C 和 C++ 中的答案是否相同?” 是的!

回答by Ken Kin

Semantically, passingbetween unsigned char *and char *are safe, and even though casting between them, so as in c++.

从语义上讲,在和之间传递是安全的,即使在它们之间进行转换,就像在 C++ 中一样。 unsigned char *char *

However, consider the following sample code:

但是,请考虑以下示例代码:

#include "stdio.h"

void process_unsigned(unsigned char *data_in, int data_len) {
    int i=data_len;
    unsigned short product=1;

    for(; i--; product*=data_in[i]) 
        ;

    for(i=sizeof(product); i--; ) {
        data_in[i]=((unsigned char *)&product)[i];
        printf("%d\r\n", data_in[i]);
    }
}

void process(char *data_in, int data_len) {
    int i=data_len;
    unsigned short product=1;

    for(; i--; product*=data_in[i]) 
        ;

    for(i=sizeof(product); i--; ) {
        data_in[i]=((unsigned char *)&product)[i];
        printf("%d\r\n", data_in[i]);
    }
}

void main() {
    unsigned char 
        a[]={1, -1}, 
        b[]={1, -1};

    process_unsigned(a, sizeof(a));
    process(b, sizeof(b));
    getch();
}

output:

输出:

0
255
-1
-1

All the code inside process_unsignedand processare just IDENTICAL. The only difference is unsigned and signed. This sample shows that the code in the black box, do be affected by the SIGN, and nothingis guaranteed between the callee and caller.

所有里面的代码process_unsignedprocess只是IDENTICAL。唯一的区别是无符号和有符号。此示例显示黑匣子中的代码确实受到SIGN 的影响,并且在被调用者和调用者之间没有任何保证。

Thus I would say that, it's applicable of passingonly, but none of any other possibilities is guaranteed.

因此,我会说,它仅适用于通过,但不能保证任何其他可能性。

回答by Alexey Frunze

You can pass a pointer to a different kind of char, but you may need to explicitly cast it. The pointers are guaranteed to be the same size and the same values. There isn't going to be any information loss during the conversion.

您可以将指针传递给不同类型的char,但您可能需要显式转换它。指针保证具有相同的大小和相同的值。转换过程中不会丢失任何信息。

If you want to convert charto unsigned charinside the function, you just assign a charvalue to an unsigned charvariable or cast the charvalue to unsigned char.

如果要在函数内部转换charunsigned char,只需charunsigned char变量赋值或将char值强制转换为unsigned char.

If you need to convert unsigned charto charwithout data loss, it's a bit harder, but still possible:

如果您需要在不丢失数据的情况下转换unsigned charchar,则有点困难,但仍然可能:

#include <limits.h>

char uc2c(unsigned char c)
{
#if CHAR_MIN == 0
  // char is unsigned
  return c;
#else
  // char is signed
  if (c <= CHAR_MAX)
    return c;
  else
    // ASSUMPTION 1: int is larger than char
    // ASSUMPTION 2: integers are 2's complement
    return c - CHAR_MAX - 1 - CHAR_MAX - 1;
#endif
}

This function will convert unsigned charto charin such a way that the returned value can be converted back to the same unsigned charvalue as the parameter.

此函数将以这样的方式转换unsigned charchar,即返回值可以转换回unsigned char与参数相同的值。

回答by Sean Conner

You really need to view the code to process()to know if you can safely pass in unsigned characters. If the function uses the characters as an index into an array, then no, you can't use unsigned data.

您确实需要查看代码以process()了解是否可以安全地传入无符号字符。如果函数使用字符作为数组的索引,那么不,您不能使用无符号数据。