C++ 将 wstring 转换为以 UTF-8 编码的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4358870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 15:13:47  来源:igfitidea点击:

Convert wstring to string encoded in UTF-8

c++stringutf-8wstring

提问by Trakhan

I need to convert between wstring and string. I figured out, that using codecvt facet should do the trick, but it doesn't seem to work for utf-8 locale.

我需要在 wstring 和 string 之间进行转换。我发现,使用 codecvt facet 应该可以解决问题,但它似乎不适用于 utf-8 语言环境。

My idea is, that when I read utf-8 encoded file to chars, one utf-8 character is read into two normal characters (which is how utf-8 works). I'd like to create this utf-8 string from wstring representation for library I use in my code.

我的想法是,当我将 utf-8 编码文件读取为字符时,一个 utf-8 字符被读入两个普通字符(这就是 utf-8 的工作原理)。我想从我在代码中使用的库的 wstring 表示创建这个 utf-8 字符串。

Does anybody know how to do it?

有人知道怎么做吗?

I already tried this:

我已经试过了:

  locale mylocale("cs_CZ.utf-8");
  mbstate_t mystate;

  wstring mywstring = L"???yáí";

  const codecvt<wchar_t,char,mbstate_t>& myfacet =
    use_facet<codecvt<wchar_t,char,mbstate_t> >(mylocale);

  codecvt<wchar_t,char,mbstate_t>::result myresult;  

  size_t length = mywstring.length();
  char* pstr= new char [length+1];

  const wchar_t* pwc;
  char* pc;

  // translate characters:
  myresult = myfacet.out (mystate,
      mywstring.c_str(), mywstring.c_str()+length+1, pwc,
      pstr, pstr+length+1, pc);

  if ( myresult == codecvt<wchar_t,char,mbstate_t>::ok )
   cout << "Translation successful: " << pstr << endl;
  else cout << "failed" << endl;
  return 0;

which returns 'failed' for cs_CZ.utf-8 locale and works correctly for cs_CZ.iso8859-2 locale.

它为 cs_CZ.utf-8 语言环境返回“失败”,并为 cs_CZ.iso8859-2 语言环境正常工作。

采纳答案by Philipp

C++ has no idea of Unicode. Use an external library such as ICU (UnicodeStringclass) or Qt (QStringclass), both support Unicode, including UTF-8.

C++ 不知道 Unicode。使用外部库,例如 ICU ( UnicodeStringclass) 或 Qt ( QStringclass),两者都支持 Unicode,包括 UTF-8。

回答by skyde

The code below might help you :)

下面的代码可能对你有帮助:)

#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}

回答by hillel

What's your platform? Note that Windows does not support UTF-8 locales so this may explain why you're failing.

你的平台是什么?请注意,Windows 不支持 UTF-8 语言环境,因此这可以解释您失败的原因。

To get this done in a platform dependent way you can use MultiByteToWideChar/WideCharToMultiByteon Windows and iconvon Linux. You may be able to use some boost magic to get this done in a platform independent way, but I haven't tried it myself so I can't add about this option.

要以依赖于平台的方式完成此操作,您可以在 Windows 上使用MultiByteToWideChar/ WideCharToMultiByte,在 Linux上使用iconv。您也许可以使用一些 boost 魔法以独立于平台的方式完成此操作,但我自己还没有尝试过,因此我无法添加此选项。

回答by Avinash

You can use boost's utf_to_utf converter to get char format to store in std::string.

您可以使用 boost 的 utf_to_utf 转换器来获取字符格式以存储在 std::string 中。

std::string myresult = boost::locale::conv::utf_to_utf<char>(my_wstring);

回答by ?imon Tóth

What locale does is that it gives the program information about the external encoding, but assuming that the internal encoding didn't change. If you want to output UTF-8 you need to do it from wchar_tnot from char*.

locale 的作用是提供有关外部编码的程序信息,但假设内部编码没有改变。如果要输出 UTF-8,则需要从wchar_tnot from 进行char*

What you could do is output it as raw data (not string), it should be then correctly interpreted if the systems locale is UTF-8.

您可以做的是将其输出为原始数据(不是字符串),如果系统区域设置为 UTF-8,则应正确解释它。

Plus when using (w)cout/(w)cerr/(w)cinyou need to imbue the locale on the stream.

加上使用时(w)cout/ (w)cerr/(w)cin你需要灌输流上的语言环境。

回答by Frank

The Lexertl libraryhas an iterator that lets you do this:

Lexertl库有一个迭代器,可以让你做到这一点:

std::string str;
str.assign(
  lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.begin()),
  lexertl::basic_utf8_out_iterator<std::wstring::const_iterator>(wstr.end()));