如何在 C++ 中打印 Unicode 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12015571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to print Unicode character in C++?
提问by James Raitsev
I am trying to print a Russian "ф" (U+0444CYRILLIC SMALL LETTER EF) character, which is given a code of decimal 1092. Using C++, how can I print out this character? I would have thought something along the lines of the following would work, yet...
我正在尝试打印一个俄语“ф”(U+0444 西里尔小写字母 EF)字符,它的代码为十进制1092。使用 C++,我怎样才能打印出这个字符?我原以为以下内容会起作用,但是......
int main (){
wchar_t f = '1060';
cout << f << endl;
}
回答by bames53
To represent the character you can use Universal Character Names (UCNs). The character 'ф' has the Unicode value U+0444 and so in C++ you could write it '\u0444' or '\U00000444'. Also if the source code encoding supports this character then you can just write it literally in your source code.
要表示字符,您可以使用通用字符名称 (UCN)。字符 'ф' 的 Unicode 值是 U+0444,所以在 C++ 中你可以把它写成 '\u0444' 或 '\U00000444'。此外,如果源代码编码支持此字符,那么您可以直接在源代码中写入它。
// both of these assume that the character can be represented with
// a single char in the execution encoding
char b = '\u0444';
char a = 'ф'; // this line additionally assumes that the source character encoding supports this character
Printing such characters out depends on what you're printing to. If you're printing to a Unix terminal emulator, the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do the following:
打印出这些字符取决于您要打印的内容。如果您打印到 Unix 终端模拟器,终端模拟器使用支持此字符的编码,并且该编码与编译器的执行编码匹配,则您可以执行以下操作:
#include <iostream>
int main() {
std::cout << "Hello, ф or \u0444!\n";
}
This program does notrequire that 'ф' can be represented in a single char. On OS X and most any modern Linux install this will work just fine, because the source, execution, and console encodings will all be UTF-8 (which supports all Unicode characters).
该程序不要求 'ф' 可以用单个字符表示。在 OS X 和大多数现代 Linux 安装上,这都可以正常工作,因为源代码、执行和控制台编码都将是 UTF-8(支持所有 Unicode 字符)。
Things are harder with Windows and there are different possibilities with different tradeoffs.
Windows 的事情更难,不同的权衡有不同的可能性。
Probably the best, if you don't need portable code (you'll be using wchar_t, which should really be avoided on every other platform), is to set the mode of the output file handle to take only UTF-16 data.
如果您不需要可移植代码(您将使用 wchar_t,这在其他所有平台上都应该避免使用),最好的方法是将输出文件句柄的模式设置为仅采用 UTF-16 数据。
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Hello, \u0444!\n";
}
Portable code is more difficult.
可移植代码更难。
回答by James Raitsev
When compiling with -std=c++11
, one can simply
编译时-std=c++11
,可以简单地
const char *s = u8"\u0444";
cout << s << endl;
回答by Puppy
Ultimately, this is completely platform-dependent. Unicode-support is, unfortunately, very poor in Standard C++. For GCC, you will have to make it a narrow string, as they use UTF-8, and Windows wants a wide string, and you must output to wcout
.
最终,这完全取决于平台。不幸的是,Unicode 支持在标准 C++ 中非常差。对于 GCC,您必须将其设为窄字符串,因为它们使用 UTF-8,而 Windows 需要宽字符串,您必须输出到wcout
.
// GCC
std::cout << "ф";
// Windoze
wcout << L"ф";
回答by vladasimovic
If you use Windows (note, we are using printf(), not cout):
如果您使用 Windows(注意,我们使用的是 printf(),而不是 cout):
//Save As UTF8 without signature
#include <stdio.h>
#include<windows.h>
int main (){
SetConsoleOutputCP(65001);
printf("ф\n");
}
Not Unicode but working - 1251 instead of UTF8:
不是 Unicode 但可以工作 - 1251 而不是 UTF8:
//Save As Windows 1251
#include <iostream>
#include<windows.h>
using namespace std;
int main (){
SetConsoleOutputCP(1251);
cout << "ф" << endl;
}
回答by Mike DeSimone
'1060'
is four characters, and won't compile under the standard. You should just treat the character as a number, if your wide characters match 1:1 with Unicode (check your locale settings).
'1060'
是四个字符,不会在标准下编译。如果您的宽字符与 Unicode 以 1:1 匹配(检查您的语言环境设置),您应该将字符视为数字。
int main (){
wchar_t f = 1060;
wcout << f << endl;
}
回答by Iro
This code works in Linux (C++11, geany, g++ 7.4.0):
此代码适用于 Linux(C++11、geany、g++ 7.4.0):
#include <iostream>
using namespace std;
int utf8_to_unicode(string utf8_code);
string unicode_to_utf8(int unicode);
int main()
{
cout << unicode_to_utf8(36) << '\t';
cout << unicode_to_utf8(162) << '\t';
cout << unicode_to_utf8(8364) << '\t';
cout << unicode_to_utf8(128578) << endl;
cout << unicode_to_utf8(0x24) << '\t';
cout << unicode_to_utf8(0xa2) << '\t';
cout << unicode_to_utf8(0x20ac) << '\t';
cout << unicode_to_utf8(0x1f642) << endl;
cout << utf8_to_unicode("$") << '\t';
cout << utf8_to_unicode("¢") << '\t';
cout << utf8_to_unicode("") << '\t';
cout << utf8_to_unicode("") << endl;
cout << utf8_to_unicode("\x24") << '\t';
cout << utf8_to_unicode("\xc2\xa2") << '\t';
cout << utf8_to_unicode("\xe2\x82\xac") << '\t';
cout << utf8_to_unicode("\xf0\x9f\x99\x82") << endl;
return 0;
}
int utf8_to_unicode(string utf8_code)
{
unsigned utf8_size = utf8_code.length();
int unicode = 0;
for (unsigned p=0; p<utf8_size; ++p)
{
int bit_count = (p? 6: 8 - utf8_size - (utf8_size == 1? 0: 1)),
shift = (p < utf8_size - 1? (6*(utf8_size - p - 1)): 0);
for (int k=0; k<bit_count; ++k)
unicode += ((utf8_code[p] & (1 << k)) << shift);
}
return unicode;
}
string unicode_to_utf8(int unicode)
{
string s;
if (unicode>=0 and unicode <= 0x7f) // 7F(16) = 127(10)
{
s = static_cast<char>(unicode);
return s;
}
else if (unicode <= 0x7ff) // 7FF(16) = 2047(10)
{
unsigned char c1 = 192, c2 = 128;
for (int k=0; k<11; ++k)
{
if (k < 6) c2 |= (unicode % 64) & (1 << k);
else c1 |= (unicode >> 6) & (1 << (k - 6));
}
s = c1; s += c2;
return s;
}
else if (unicode <= 0xffff) // FFFF(16) = 65535(10)
{
unsigned char c1 = 224, c2 = 128, c3 = 128;
for (int k=0; k<16; ++k)
{
if (k < 6) c3 |= (unicode % 64) & (1 << k);
else if (k < 12) c2 |= (unicode >> 6) & (1 << (k - 6));
else c1 |= (unicode >> 12) & (1 << (k - 12));
}
s = c1; s += c2; s += c3;
return s;
}
else if (unicode <= 0x1fffff) // 1FFFFF(16) = 2097151(10)
{
unsigned char c1 = 240, c2 = 128, c3 = 128, c4 = 128;
for (int k=0; k<21; ++k)
{
if (k < 6) c4 |= (unicode % 64) & (1 << k);
else if (k < 12) c3 |= (unicode >> 6) & (1 << (k - 6));
else if (k < 18) c2 |= (unicode >> 12) & (1 << (k - 12));
else c1 |= (unicode >> 18) & (1 << (k - 18));
}
s = c1; s += c2; s += c3; s += c4;
return s;
}
else if (unicode <= 0x3ffffff) // 3FFFFFF(16) = 67108863(10)
{
; // actually, there are no 5-bytes unicodes
}
else if (unicode <= 0x7fffffff) // 7FFFFFFF(16) = 2147483647(10)
{
; // actually, there are no 6-bytes unicodes
}
else ; // incorrect unicode (< 0 or > 2147483647)
return "";
}
More:
更多的:
回答by quanta
回答by VoyciecH
Another solution in Linux:
Linux 中的另一种解决方案:
string a = "Ф";
cout << "Ф = \xd0\xa4 = " << hex
<< int(static_cast<unsigned char>(a[0]))
<< int(static_cast<unsigned char>(a[1])) << " (" << a.length() << "B)" << endl;
string b = "√";
cout << "√ = \xe2\x88\x9a = " << hex
<< int(static_cast<unsigned char>(b[0]))
<< int(static_cast<unsigned char>(b[1]))
<< int(static_cast<unsigned char>(b[2])) << " (" << b.length() << "B)" << endl;
回答by MGR
I needed to show the string in UI as well as save that to an xml configuration file. The above specified format is good for string in c++, I would add we can have the xml compatible string for the special character by replacing "\u" by "&#x" and adding a ";" at the end.
我需要在 UI 中显示字符串并将其保存到 xml 配置文件中。上面指定的格式适用于 C++ 中的字符串,我想补充一点,我们可以通过将“\u”替换为“&#x”并添加一个“;”来获得特殊字符的 xml 兼容字符串 在末尾。
For example :
C++ : "\u0444" --> XML : "ф"
例如: C++ : "\u0444" --> XML : "ф"