C++ 将 std::string 编码/解码为 UTF-16

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11086183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:50:23  来源:igfitidea点击:

Encode/Decode std::string to UTF-16

c++utf-16stdstring

提问by Peter

I have to handle a file format (both read from and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters out of the ASCII table are rarely used in the application domain, all of the strings in my C++ model classes are stored in instances of std::string (UTF-8 encoded).

我必须处理一种文件格式(读取和写入),其中字符串以 UTF-16 编码(每个字符 2 个字节)。由于 ASCII 表中的字符很少在应用程序域中使用,因此我的 C++ 模型类中的所有字符串都存储在 std::string(UTF-8 编码)的实例中。

I'm looking for a library (searched in STL and Boost with no luck) or a set of C/C++ functions to handle this std::string <-> UTF-16 conversion when loading from or saving to file format (actually modeled as a bytestream) including the generation/recognition of surrogate pairs and all that Unicode stuffs (I'm admittedly no expert with)...

我正在寻找一个库(在 STL 和 Boost 中搜索但没有运气)或一组 C/C++ 函数来处理这个 std::string <-> UTF-16 转换时从文件格式加载或保存到文件格式(实际建模作为字节流)包括代理对的生成/识别和所有 Unicode 的东西(我承认我不是专家)......

Any suggestions? Thanks!

有什么建议?谢谢!

EDIT: forgot to mention it should be cross-platform (Win / Mac) and cannot use C++11.

编辑:忘了提到它应该是跨平台的(Win/Mac)并且不能使用 C++11。

回答by bames53

C++11 has this functionality:

C++11 有这个功能:

std::string s = u8"Hello, World!";

// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;

std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);

However to my knowledge the only implementation that has this so far is libc++. C++11 also has std::codecvt_utf8_utf16<char16_t>which some other implementations have. Specifically, codecvt_utf8_utf16works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16 you can use this to convert between UTF-8 and Windows' native encoding.

然而,据我所知,迄今为止唯一具有此功能的实现是 libc++。C++11 也具有std::codecvt_utf8_utf16<char16_t>其他一些实现所具有的特性。具体来说,codecvt_utf8_utf16适用于 VS 2010 及更高版本,并且由于 Windows 使用 wchar_t 来表示 UTF-16,因此您可以使用它在 UTF-8 和 Windows 的本机编码之间进行转换



The specialization codecvt<char16_t, char, mbstate_t>converts between the UTF-16 and UTF-8 encoding schemes, and the specialization codecvt<char32_t, char, mbstate_t>converts between the UTF-32 and UTF-8 encoding schemes.

                                                                                                                         — [locale.codecvt] 22.4.1.4/3

特化codecvt<char16_t, char, mbstate_t>在 UTF-16 和 UTF-8 编码方案codecvt<char32_t, char, mbstate_t>之间转换,特化在 UTF-32 和 UTF-8 编码方案之间转换。

                                                                                                                         — [locale.codecvt] 22.4.1.4/3



Oh, and std::codecvt specializations have protected destructors, and wstring_convert requires access to the destructor so you really need an adapter:

哦,std::codecvt 特化有受保护的析构函数,而 wstring_convert 需要访问析构函数,所以你真的需要一个适配器:

template <class Facet>
class usable_facet : public Facet {
public:
    using Facet::Facet; // inherit constructors
    ~usable_facet() {}

    // workaround for compilers without inheriting constructors:
    // template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
};

template<typename internT, typename externT, typename stateT> 
using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;

std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;

回答by thehouse

Did you look at Boost.Locale? This page, in particular, describes how to do UTF to UTF conversions and how to integrate it with IOStreams.

你看过Boost.Locale吗? 本页特别介绍了如何进行 UTF 到 UTF 转换以及如何将其与 IOStreams 集成。

回答by JYG

I would suggest having a look at:

我建议看看:

Convert C++ std::string to UTF-16-LE encoded string

将 C++ std::string 转换为 UTF-16-LE 编码的字符串

And check out the iconv function. It's a C library, no requirements for C++11.

并查看 iconv 函数。它是一个 C 库,对 C++11 没有要求。

There's also a Win32 specific iconv library at https://github.com/win-iconv/win-iconv.

https://github.com/win-iconv/win-iconv上还有一个 Win32 特定的 iconv 库。