C++ 为什么 printf 不格式化 unicode 参数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10007261/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why doesn't printf format unicode parameters?
提问by Scott Langham
When using printf to format a double-byte string into a single-byte string:
使用 printf 将双字节字符串格式化为单字节字符串时:
printf("%ls\n", L"s:\яшертыHello"); // %ls for a wide string (%s varies meaning depending on the project's unicode settings).
Clearly, some characters can't be represented as ascii characters, so sometimes I have seen behaviour where double-byte characters get turned into a '?' mark character. But, this seems to depend on the particular characters. For the printf above, the output is:
显然,有些字符不能表示为 ascii 字符,所以有时我看到双字节字符变成 '?' 的行为。标记字符。但是,这似乎取决于特定的字符。对于上面的 printf,输出是:
s:\
I was hoping I might get something like:
我希望我能得到类似的东西:
s:\??????Hello
I'm afraid I've lost the example, but I think for one string when it encountered unicode characters, replaced the first one with a '?' and then gave up on the rest.
恐怕我已经失去了这个例子,但我认为对于一个字符串,当它遇到 unicode 字符时,用 '?' 替换第一个字符串。然后放弃了剩下的。
So, my question is, what's supposed to happen when you format a wide string into a single-byte string. Documentation here: http://msdn.microsoft.com/en-us/library/hf4y5e3w.aspxsays "Characters are displayed up to the first null character". But, I'm not seeing that. Is this a bug in printf, or is the behaviour I'm seeing documented somewhere, if so, where.
所以,我的问题是,当您将宽字符串格式化为单字节字符串时会发生什么。这里的文档:http: //msdn.microsoft.com/en-us/library/hf4y5e3w.aspx说“字符显示到第一个空字符”。但是,我没有看到。这是 printf 中的错误,还是我在某处看到的行为(如果有,在哪里)。
Thanks for your help.
谢谢你的帮助。
UPDATE
更新
Thanks for the answers from people giving me alternatives to using printf. I am going to change to an alternative, but I'm really interested out of curiosity why does printf not have reliable documented behaviour. It appears almost as if the implementer of it went out of their way to make this not work.
感谢人们给我提供使用 printf 的替代方法的答案。我将改用另一种方法,但出于好奇,为什么 printf 没有可靠的记录行为,我真的很感兴趣。看起来好像它的实现者不遗余力地使这不起作用。
回答by AProgrammer
I expect your code to work -- and it works here on Linux -- but it is locale dependent. That means you have to set up the locale and your locale must support the character set used. Here is my test program:
我希望您的代码能够工作——它在 Linux 上工作——但它依赖于语言环境。这意味着您必须设置语言环境并且您的语言环境必须支持所使用的字符集。这是我的测试程序:
#include <locale.h>
#include <stdio.h>
int main()
{
int c;
char* l = setlocale(LC_ALL, "");
if (l == NULL) {
printf("Locale not set\n");
} else {
printf("Locale set to %s\n", l);
}
printf("%ls\n", L"s:\яшертыHello");
return 0;
}
and here is an execution trace:
这是一个执行跟踪:
$ env LC_ALL=en_US.utf8 ./a.out
Locale set to en_US.utf8
s:\яшертыHello
If it says that the locale isn't set or is set to "C", it is normal that you don't get the result you expect.
如果它说语言环境未设置或设置为“C”,则您没有得到预期的结果是正常的。
Edit: see the answers to this questionfor the equivalent of en_US.utf8 for Windows.
编辑:请参阅此问题的答案,以了解 Windows 的 en_US.utf8 等价物。
回答by Naszta
In C++ I usually use std::stringstream
to create formatted text. I also implemented an own operator to use Windows function to make the encoding:
在 C++ 中,我通常std::stringstream
用来创建格式化文本。我还实现了一个自己的操作符来使用 Windows 函数进行编码:
ostream & operator << ( ostream &os, const wchar_t * str )
{
if ( ( str == 0 ) || ( str[0] == L'##代码##' ) )
return os;
int new_size = WideCharToMultiByte( CP_UTF8, 0, str, -1, NULL, NULL, NULL, NULL );
if ( new_size <= 0 )
return os;
std::vector<char> buffer(new_size);
if ( WideCharToMultiByte( CP_UTF8, 0, str, -1, &buffer[0], new_size, NULL, NULL ) > 0 )
os << &buffer[0];
return os;
}
This code convert to UTF-8. For other possibilities check: WideCharToMultiByte
.
此代码转换为 UTF-8。对于其他可能性检查:WideCharToMultiByte
。