C++ 将 wchar_t 转换为 char
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3019977/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert wchar_t to char
提问by Cheok Yan Cheng
I was wondering is it safe to do so?
我想知道这样做是否安全?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
如果我很确定宽字符将落在 ASCII 范围内。
采纳答案by Mark Ransom
assert
is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if
statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
assert
是为了确保在调试模式下某些事情是正确的,而不会在发布版本中产生任何影响。最好使用if
语句并为范围外的字符制定备用计划,除非获得范围外字符的唯一方法是通过程序错误。
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char
version.
此外,根据您的字符编码,您可能会发现 Unicode 字符 0x80 到 0xff 与其char
版本之间存在差异。
回答by geocar
You are looking for wctomb()
: it's in the ANSI standard, so you can count on it. It works even when the wchar_t
uses a code above 255. You almost certainly do not want to use it.
您正在寻找wctomb()
:它符合 ANSI 标准,因此您可以信赖它。即使wchar_t
使用 255 以上的代码,它也能工作。您几乎肯定不想使用它。
wchar_t
isan integral type, so your compiler won't complain if you actually do:
wchar_t
是一个整数类型,所以如果你真的这样做,你的编译器不会抱怨:
char x = (char)wc;
but becauseit's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or anyC book based on it, then you're completely and grossly misinformed. Charactersshould be of type int
or better. That means you should be writing this:
但是因为它是一个整数类型,所以绝对没有理由这样做。如果您不小心阅读了Herbert Schildt 的 C: The Complete Reference或任何基于它的 C 书籍,那么您就完全被误导了。字符应该是类型int
或更好。这意味着你应该这样写:
int x = getchar();
and not this:
而不是这个:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char
is worthless. You shouldn't make functions that take parameters of type char
, and you should not create temporary variables of type char
, and the same advice goes for wchar_t
as well.
就整数类型而言,char
它毫无价值。您不char
应该创建采用 type 参数的函数,也不应该创建 type 的临时变量,char
同样的建议wchar_t
也适用。
char*
may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecltool says. Treating it as an actual array of characters with nonsense like this:
char*
对于字符串来说可能是一个方便的 typedef,但是将其视为“字符数组”或“指向字符数组的指针”是新手错误——尽管cdecl工具是这样说的。将其视为实际的字符数组,如下所示:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will notdo what you want; it willbreak in subtle and serious ways, behave differently on different platforms, and you will most certainlyconfuse the hell out of your users. If you see this, you are trying to reimplement wctombs()
which is part of ANSI C already, but it's still wrong.
错得离谱。它不会做你想做的事;它会以微妙而严肃的方式打破,在不同平台上表现不同,你肯定会让你的用户感到困惑。如果你看到这个,你正在尝试重新实现wctombs()
ANSI C 的一部分,但它仍然是错误的。
You're reallylooking for iconv()
, which converts a character string from one encoding (even if it's packed into a wchar_t
array), into a character string of another encoding.
您确实在寻找iconv()
,它将字符串从一种编码(即使它打包到wchar_t
数组中)转换为另一种编码的字符串。
Now go read this, to learn what's wrong with iconv.
现在去读这个,了解 iconv 有什么问题。
回答by cvanbrederode
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
我写了一个简短的函数,将 wchar_t 数组打包到 char 数组中。不在 ANSI 代码页 (0-127) 上的字符将替换为“?” 字符,并正确处理代理对。
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != 'char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != ' wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += 'wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '##代码##';
return i - 1;
}
回答by Richard Bamford
Here's another way of doing it, remember to use free() on the result.
这是另一种方法,记住对结果使用 free() 。
##代码##回答by Jonathan Leffler
Technically, 'char
' could have the same range as either 'signed char
' or 'unsigned char
'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
从技术上讲,“ char
”可以与“ signed char
”或“ unsigned char
”具有相同的范围。对于无符号字符,您的范围是正确的;从理论上讲,对于签名字符,您的条件是错误的。实际上,很少有编译器会反对——结果是一样的。
Nitpick: the last &&
in the assert
is a syntax error.
鸡蛋里挑骨头:最后&&
的assert
是一个语法错误。
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
断言是否合适取决于当代码到达客户时您是否能够承受崩溃,以及如果断言条件被违反但断言没有编译到代码中,您可以或应该做什么。对于调试工作,它看起来不错,但您可能还需要在它之后进行活动测试以进行运行时检查。
回答by J.Mcgill
An easy way is :
一个简单的方法是:
##代码##I'm using this method for years :)
我多年来一直使用这种方法:)
回答by Mr Bat Lee
one could also convert wchar_t --> wstring --> string --> char
也可以转换 wchar_t --> wstring --> string --> char
##代码##回答by MSalters
In general, no. int(wchar_t(255)) == int(char(255))
of course, but that just means they have the same int value. They may not represent the same characters.
一般来说,没有。int(wchar_t(255)) == int(char(255))
当然,但这只是意味着它们具有相同的 int 值。它们可能不代表相同的字符。
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF)
is the same character as wchar_t(0x02D9)
(dot above), not wchar_t(0x00FF)
(small y with diaeresis).
您甚至会在大多数 Windows PC 中看到这种差异。例如,在 Windows 代码页 1250 上,char(0xFF)
是与wchar_t(0x02D9)
(上面的点)相同的字符,而不是wchar_t(0x00FF)
(带分音符的小 y)。
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65
请注意,它甚至不适用于 ASCII 范围,因为 C++ 甚至不需要 ASCII。特别是在 IBM 系统上,您可能会看到'A' != 65