windows 为什么发明 wchar_t?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1613494/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why was wchar_t invented?
提问by CannibalSmith
Why is wchar_t
needed? How is it superior to short
(or __int16
or whatever)?
为什么wchar_t
需要?它如何优于short
(或__int16
或其他)?
(If it matters: I live in Windows world. I don't know what Linux does to support Unicode.)
(如果重要的话:我生活在 Windows 世界。我不知道 Linux 做了什么来支持 Unicode。)
采纳答案by sbi
Why is wchar_t needed? How is it superior to short (or __int16 or whatever)?
为什么需要 wchar_t?它如何优于 short (或 __int16 或其他)?
In the C++ world, wchar_t
is its own type (I think it's a typedef
in C), so you can overload functions based on this. For example, this makes it possible to output wide characters and notto output their numerical value. In VC6, where wchar_t
was just a typedef
for unsigned short
, this code
在 C++ 世界中,wchar_t
是它自己的类型(我认为它是typedef
C 中的 a ),因此您可以基于此重载函数。例如,这使得可以输出宽字符而不输出它们的数值。在 VC6 中,哪里wchar_t
只是一个typedef
for unsigned short
,这段代码
wchar_t wch = L'A'
std::wcout << wch;
would output 65
because
会输出65
因为
std::ostream<wchar_t>::operator<<(unsigned short)
was invoked. In newer VC versions wchar_t
is a distinct type, so
被调用。在较新的 VC 版本中wchar_t
是一种不同的类型,所以
std::ostream<wchar_t>::operator<<(wchar_t)
is called, and that outputs A
.
被调用,并输出A
.
回答by Aaron Digulla
See Wikipedia.
参见维基百科。
Basically, it's a portable type for "text" in the current locale (with umlauts). It predates Unicode and doesn't solve many problems, so today, it mostly exists for backward compatibility. Don't use it unless you have to.
基本上,它是当前语言环境中“文本”的可移植类型(带有变音符号)。它早于 Unicode 并且没有解决很多问题,所以今天,它主要是为了向后兼容而存在的。除非必须,否则不要使用它。
回答by Michael Burr
The reason there's a wchar_t
is pretty much the same reason there's a size_t
or a time_t
- it's an abstraction that indicates what a type is intended to represent and allows implementations to chose an underlying type that can represent the type properly on a particular platform.
有 a 的原因wchar_t
与有 asize_t
或 a 的原因几乎相同time_t
- 它是一种抽象,表明类型打算表示什么,并允许实现选择可以在特定平台上正确表示类型的基础类型。
Note that wchar_t
doesn't need to be a 16 bit type - there are platforms where it's a 32-bit type.
请注意,wchar_t
不需要是 16 位类型 - 有些平台是 32 位类型。
回答by Thomas Padron-McCarthy
It is usually considered a good thing to give things such as data types meaningful names.
为数据类型之类的东西赋予有意义的名称通常被认为是一件好事。
What is best, charor int8? I think this:
什么是最好的,char还是int8?我认为这:
char name[] = "Bob";
is much easier to understand than this:
比这更容易理解:
int8 name[] = "Bob";
It's the same thing with wchar_tand int16.
这与同样的事情的wchar_t和INT16。
回答by iain
wchar_t
is the primitive for storing and processing the platform's unicode characters. Its size is not always 16 bit. On unix systems wchar_t
is 32 bit (maybe unix users are more likely to use the klingon charaters that the extra bits are used for :-).
wchar_t
是用于存储和处理平台的 unicode 字符的原语。它的大小并不总是 16 位。在 unix 系统上wchar_t
是 32 位(也许 unix 用户更有可能使用额外位用于 :- 的克林贡字符)。
This can pose problems for porting projects especially if you interchange wchar_t
and short, or if you interchange wchar_t
and xerces' XMLCh
.
这可能会给移植项目带来问题,特别是如果您互换wchar_t
和缩短,或者如果您互换wchar_t
和 xerces' XMLCh
。
Therefore having wchar_t
as a different type to short is very important for writing cross-platform code. Cleaning up this was one of the hardest parts of porting our application to unix and then from VC6 to VC2005.
因此,wchar_t
将短类型设为不同类型对于编写跨平台代码非常重要。清理它是将我们的应用程序移植到 unix,然后从 VC6 移植到 VC2005 的最困难的部分之一。
回答by gnud
As I read the relevant standards, it seems like Microsoft fcked this one up badly.
当我阅读相关标准时,似乎微软对这个标准很糟糕。
My manpage for the POSIX <stddef.h>
says that:
我的 POSIX 联机帮助页<stddef.h>
说:
- wchar_t: Integer type whose range of values can represent distinct wide-character codes for all mem‐ bers of the largest character set specified among the locales supported by the compilation environment: the null character has the code value 0 and each member of the portable character set has a code value equal to its value when used as the lone character in an integer character constant.
- wchar_t: 整数类型,其值范围可以代表编译环境支持的语言环境中指定的最大字符集的所有成员的不同宽字符代码:空字符的代码值为 0,可移植字符的每个成员set 的代码值等于它在整数字符常量中用作单独字符时的值。
So, 16 bits wchar_t is not enough if your platform supports Unicode. Each wchar_t is supposed to be a distinct value for a character. Therefore, wchar_t goes from being a useful way to work at the character level of texts (after a decoding from the locale multibyte, of course), to being completely useless on Windows platforms.
因此,如果您的平台支持 Unicode,则 16 位 wchar_t 是不够的。每个 wchar_t 应该是一个字符的不同值。因此, wchar_t 从一种在文本字符级别工作的有用方法(当然,在从语言环境多字节解码之后)变成在 Windows 平台上完全无用。
回答by Nemanja Trifunovic
To add to Aaron's comment - in C++0x we are finally getting real Unicode char types: char16_t and char32_t and also Unicode string literals.
添加到 Aaron 的评论中 - 在 C++0x 中,我们终于获得了真正的 Unicode 字符类型:char16_t 和 char32_t 以及 Unicode 字符串文字。
回答by Robert Tuck
wchar_t is a bit of a hangover from before unicode standardisation. Unfortunately it's not very helpful because the encoding is platform specific (and on Solaris, locale-specific!), and the width is not specified. In addition there are no guarantees that utf-8/16/32 codecvt facets will be available, or indeed how you would access them. In general it's a bit of a nightmare for portable usage.
wchar_t 是 unicode 标准化之前的一个宿醉。不幸的是,它不是很有帮助,因为编码是特定于平台的(在 Solaris 上,是特定于语言环境的!),并且没有指定宽度。此外,无法保证 utf-8/16/32 codecvt facet 可用,或者您确实如何访问它们。一般来说,对于便携式使用来说,这有点像噩梦。
Apparently c++0x will have support for unicode, but at the current rate of progress that may never happen...
显然 c++0x 将支持 unicode,但以目前的进展速度可能永远不会发生......
回答by AnT
It is "superior" in a sense that it allows you to separate contexts: you use wchar_t
in character contexts (like strings), and you use short
in numerical contexts (numbers). Now the compiler can perform type checking to help you catch situations where you mistakenly mix one with another, like pass an abstract non-string array of short
s to a string processing function.
从某种意义上说,它是“优越的”,它允许您分离上下文:您wchar_t
在字符上下文中使用(如字符串),short
在数字上下文中使用(数字)。现在编译器可以执行类型检查以帮助您发现错误地将一种与另一种混合的情况,例如将short
s的抽象非字符串数组传递给字符串处理函数。
As a side node (since this was a C question), in C++ wchar_t
allows you to overload functions independently from short
, i.e. again provide independent overloads that work with strings and numbers (for example).
作为一个辅助节点(因为这是一个 C 问题),在 C++ 中wchar_t
允许您独立于 重载函数short
,即再次提供处理字符串和数字的独立重载(例如)。
回答by MarcH
Except for a small, ISO 2022 japanese minority, wchar_t is always going to be unicode. If you are really anxious you can make sure of that at compile time:
除了一小部分ISO 2022 日本少数民族外, wchar_t 始终是 unicode。如果你真的很着急,你可以在编译时确保这一点:
#ifndef __STDC_ISO_10646__
#error "non-unicode wchar_t, unsupported system"
#endif
Sometimes wchar_t is 16bits UCS-2 sometimes 32bits UCS-4, so what? Just use sizeof(wchar_t)
. wchar_t is NOT meant to be sent to disk nor to the network, it is only meant to be used in memory.
有时 wchar_t 是 16 位 UCS-2 有时是 32 位 UCS-4,那又怎样?只需使用sizeof(wchar_t)
. wchar_t 不打算发送到磁盘或网络,它只打算在内存中使用。
See also Should UTF-16 be considered harmful?on this site.
另请参阅UTF-16 是否应被视为有害?在这个网站上。