C++ 将“普通” std::string 转换为 utf-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21575310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting "normal" std::string to utf-8
提问by DaedalusAlpha
Let's see if I can explain this without too many factual errors...
让我们看看我是否可以在没有太多事实错误的情况下解释这一点......
I'm writing a string class and I want it to use utf-8
(stored in a std::string) as it's internal storage.
I want it to be able to take both "normal" std::string
and std::wstring
as input and output.
我正在编写一个字符串类,我希望它使用utf-8
(存储在 std::string 中)作为内部存储。我希望它能够同时采用“正常”std::string
和std::wstring
作为输入和输出。
Working with std::wstring is not a problem, I can use std::codecvt_utf8<wchar_t>
to convert both from and to std::wstring.
使用 std::wstring 不是问题,我可以用它std::codecvt_utf8<wchar_t>
来转换 std::wstring。
However after extensive googling and searching on SO I have yet to find a way to convert between a "normal/default" C++ std::string (which I assume in Windows is using the local system localization?) and an utf-8 std::string.
然而,经过大量的谷歌搜索和搜索之后,我还没有找到一种在“正常/默认”C++ std::string(我假设在 Windows 中使用本地系统本地化?)和 utf-8 std 之间转换的方法: :细绳。
I guess one option would be to first convert the std::string to an std::wstring using std::codecvt<wchar_t, char>
and then convert it to utf-8 as above, but this seems quite inefficient given that at least the first 128 values of a char should translate straight over to utf-8 without conversion regardless of localization if I understand correctly.
我想一个选择是首先将 std::string 转换为 std::wstring 使用std::codecvt<wchar_t, char>
,然后将其转换为 utf-8 如上所述,但这似乎效率很低,因为至少应该转换字符的前 128 个值如果我理解正确,无论本地化如何,直接转到 utf-8 而不进行转换。
I found this similar question: C++: how to convert ASCII or ANSI to UTF8 and stores in std::stringAlthough I'm a bit skeptic towards that answer as it's hard coded to latin 1 and I want this to work with all types of localization to be on the safe side.
我发现了这个类似的问题:C++: how to convert ASCII or ANSI to UTF8 and stores in std::string虽然我对这个答案有点怀疑,因为它被硬编码为 latin 1,我希望它适用于所有类型的本地化是安全的。
No answers involving boost thanks, I don't want the headache of getting my codebase to work with it.
没有涉及 boost 的答案,谢谢,我不想让我的代码库使用它而头疼。
回答by Simple
If your "normal string" is encoded using the system's code page and you want to convert it to UTF-8 then this should work:
如果您的“普通字符串”是使用系统代码页编码的,并且您想将其转换为 UTF-8,那么这应该可以工作:
std::string codepage_str;
int size = MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
codepage_str.length(), nullptr, 0);
std::wstring utf16_str(size, '##代码##');
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, codepage_str.c_str(),
codepage_str.length(), &utf16_str[0], size);
int utf8_size = WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
utf16_str.length(), nullptr, 0,
nullptr, nullptr);
std::string utf8_str(utf8_size, '##代码##');
WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
utf16_str.length(), &utf8_str[0], utf8_size,
nullptr, nullptr);