C++ 如何将 wstring 转换为字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4804298/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 16:34:16  来源:igfitidea点击:

How to convert wstring into string?

c++unicodestlwstring

提问by B?ови?

The question is how to convert wstring to string?

问题是如何将 wstring 转换为字符串?

I have next example :

我有下一个例子:

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

the output with commented out line is :

带有注释行的输出是:

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

but without is only :

但没有只是:

std::wstring =    Hello

Is anything wrong in the example? Can I do the conversion like above?

示例中有什么问题吗?我可以像上面那样进行转换吗?

EDIT

编辑

New example (taking into account some answers) is

新示例(考虑到一些答案)是

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

The output is :

输出是:

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

therefore the stringstream can not be used to convert wstring into string.

因此不能使用 stringstream 将 wstring 转换为字符串。

采纳答案by Philipp

Here is a worked-out solution based on the other suggestions:

这是基于其他建议的解决方案:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"???l?";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

This will usually work for Linux, but will create problems on Windows.

这通常适用于 Linux,但会在 Windows 上产生问题。

回答by dk123

As Cubbi pointed out in one of the comments, std::wstring_convert(C++11) provides a neat simple solution (you need to #include<locale>and <codecvt>):

正如 Cubbi 在其中一条评论中指出的那样,std::wstring_convert(C++11) 提供了一个简洁的解决方案(您需要#include<locale><codecvt>):

std::wstring string_to_convert;

//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );

I was using a combination of wcstombsand tedious allocation/deallocation of memory before I came across this.

wcstombs在遇到这个问题之前,我使用了乏味的内存分配/释放组合。

http://en.cppreference.com/w/cpp/locale/wstring_convert

http://en.cppreference.com/w/cpp/locale/wstring_convert

update(2013.11.28)

更新(2013.11.28)

One liners can be stated as so (Thank you Guss for your comment):

一个班轮可以这样说(感谢 Guss 的评论):

std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");

Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)

包装函数可以这样表述:(感谢 ArmanSchwarz 的评论)

std::wstring s2ws(const std::string& str)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.from_bytes(str);
}

std::string ws2s(const std::wstring& wstr)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.to_bytes(wstr);
}

Note: there's some controversy on whether string/wstringshould be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.

注意:关于string/wstring应该作为引用还是作为文字传递给函数存在一些争议(由于 C++11 和编译器更新)。我会将决定留给实施人员,但值得了解。

Note: I'm using std::codecvt_utf8in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:

注意:我std::codecvt_utf8在上面的代码中使用,但如果您不使用 UTF-8,则需要将其更改为您正在使用的适当编码:

http://en.cppreference.com/w/cpp/header/codecvt

http://en.cppreference.com/w/cpp/header/codecvt

回答by namar0x0309

Solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html

解决方案来自:http: //forums.devshed.com/c-programming-42/wstring-to-string-444006.html

std::wstring wide( L"Wide" ); 
std::string str( wide.begin(), wide.end() );

// Will print no problemo!
std::cout << str << std::endl;


Bewarethat there is nocharacter set conversion going on here at all. What this does is simply to assign each iterated wchar_tto a char- a truncating conversion. It uses the std::string c'tor:

请注意,这里根本没有进行字符集转换。这样做只是将每个迭代分配wchar_t给 a char- 截断转换。它使用std::string c'tor

template< class InputIt >
basic_string( InputIt first, InputIt last,
              const Allocator& alloc = Allocator() );

As stated in comments:

如评论中所述:

values 0-127 are identical in virtually every encoding, so truncating values that are all less than 127 results in the same text. Put in a chinese character and you'll see the failure.

值 0-127 几乎在每种编码中都是相同的,因此截断所有小于 127 的值会产生相同的文本。输入一个汉字,你会看到失败。

-

——

the values 128-255 of windows codepage 1252 (the Windows English default) and the values 128-255 of unicode are mostly the same, so if that's teh codepage you're using most of those characters should be truncated to the correct values. (I totally expected á and ? to work, I know our code at work relies on this for é, which I will soon fix)

Windows 代码页 1252(Windows 英语默认值)的值 128-255 和 unicode 的值 128-255 几乎相同,因此如果这是代码页,您使用的大部分字符应该被截断为正确的值。(我完全期望 á 和 ? 工作,我知道我们的工作代码依赖于 é,我很快就会修复它)

And note that code points in the range 0x80 - 0x9Fin Win1252will notwork. This includes , ?, ?, ?, ...

并注意范围内的代码点0x80 - 0x9FWin1252无法正常工作。这包括, ?, ?, ?, ...

回答by Philipp

Instead of including locale and all that fancy stuff, if you know for FACT your string is convertible just do this:

而不是包括语言环境和所有那些花哨的东西,如果你知道你的字符串是可转换的,只需这样做:

#include <iostream>
#include <string>

using namespace std;

int main()
{
  wstring w(L"bla");
  string result;
  for(char x : w)
    result += x;

  cout << result << '\n';
}

Live example here

现场示例在这里

回答by Christopher Creutzig

I believe the official way is still to go thorugh codecvtfacets (you need some sort of locale-aware translation), as in

我相信官方的方法仍然是通过codecvt方面(你需要某种语言环境感知翻译),如

resultCode = use_facet<codecvt<char, wchar_t, ConversionState> >(locale).
  in(stateVar, scratchbuffer, scratchbufferEnd, from, to, toLimit, curPtr);

or something like that, I don't have working code lying around. But I'm not sure how many people these days use that machinery and how many simply ask for pointers to memory and let ICU or some other library handle the gory details.

或类似的东西,我没有工作代码。但是我不确定现在有多少人在使用这种机器,有多少人只是要求指向内存的指针,然后让 ICU 或其他一些库来处理这些血腥的细节。

回答by Bart van Ingen Schenau

There are two issues with the code:

代码有两个问题:

  1. The conversion in const std::string s( ws.begin(), ws.end() );is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast to char.
    The resolution to this problem is already given in the answer by kemand involves the narrowfunction of the locale's ctypefacet.

  2. You are writing output to both std::coutand std::wcoutin the same program. Both coutand wcoutare associated with the same stream (stdout) and the results of using the same stream both as a byte-oriented stream (as coutdoes) and a wide-oriented stream (as wcoutdoes) are not defined.
    The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For stdout/cout/wcout, you can try switching the orientation of stdoutwhen switching between wide and narrow output (or vice versa):

    #include <iostream>
    #include <stdio.h>
    #include <wchar.h>
    
    int main() {
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
        fwide(stdout, -1); // switch to narrow
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
    }
    
  1. const std::string s( ws.begin(), ws.end() );不需要转换 in将宽字符正确映射到窄字符。最有可能的是,每个宽字符都会被类型转换为char.
    这个问题的解决方案已经在kem的回答中给出,并且涉及narrow语言环境ctype方面的功能。

  2. 你正在写输出都std::coutstd::wcout在同一个程序。二者coutwcout用相同的流(相关联stdout),并且使用既作为面向字节的流(作为相同流的结果cout一样)和宽面向流(如wcout不)没有被定义。
    最好的选择是避免将窄输出和宽输出混合到同一(底层)流中。对于stdout/ cout/ wcout,可以尝试切换的取向stdout宽和窄输出(或反之亦然)之间进行切换时:

    #include <iostream>
    #include <stdio.h>
    #include <wchar.h>
    
    int main() {
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
        fwide(stdout, -1); // switch to narrow
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
    }
    

回答by legalize

You might as well just use the ctype facet's narrow method directly:

您也可以直接使用 ctype facet 的窄方法:

#include <clocale>
#include <locale>
#include <string>
#include <vector>

inline std::string narrow(std::wstring const& text)
{
    std::locale const loc("");
    wchar_t const* from = text.c_str();
    std::size_t const len = text.size();
    std::vector<char> buffer(len + 1);
    std::use_facet<std::ctype<wchar_t> >(loc).narrow(from, from + len, '_', &buffer[0]);
    return std::string(&buffer[0], &buffer[len]);
}

回答by Mark Lakata

At the time of writing this answer, the number one google search for "convert string wstring" would land you on this page. My answer shows how to convert string to wstring, although this is NOT the actual question, and I should probably delete this answer but that is considered bad form.You may want to jump to this StackOverflow answer, which is now higher ranked than this page.

在撰写此答案时,谷歌搜索“转换字符串 wstring”的第一名将使您进入此页面。我的答案显示了如何将字符串转换为 wstring,尽管这不是实际问题,我可能应该删除此答案,但这被认为是错误的形式。您可能想跳到这个 StackOverflow 答案,它现在比这个页面排名更高。



Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

这是一种将字符串、wstring 和混合字符串常量组合到 wstring 的方法。使用 wstringstream 类。

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = "wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();

回答by Joma

Default encoding on:

  • Windows UTF-16.
  • Linux UTF-8.
  • MacOS UTF-8.

默认编码:

  • Windows UTF-16。
  • Linux UTF-8。
  • MacOS UTF-8。

This code have two forms to convert std::string to std::wstring and std::wstring to std::string. If you negate #if defined WIN32, you get the same result.

此代码有两种形式将 std::string 转换为 std::wstring 和 std::wstring 转换为 std::string。如果您否定 #if 定义的 WIN32,您会得到相同的结果。

1. std::string to std::wstring

1. std::string 到 std::wstring

? MultiByteToWideCharWinAPI

? MultiByteToWideCharWinAPI

? _mbstowcs_s_l

? _mbstowcs_s_l

#if defined WIN32
#include <windows.h>
#endif

std::wstring StringToWideString(std::string str)
{
    if (str.empty())
    {
        return std::wstring();
    }
    size_t len = str.length() + 1;
    std::wstring ret = std::wstring(len, 0);
#if defined WIN32
    int size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &str[0], str.size(), &ret[0], len);
    ret.resize(size);
#else
    size_t size = 0;
    _locale_t lc = _create_locale(LC_ALL, "en_US.UTF-8");
    errno_t retval = _mbstowcs_s_l(&size, &ret[0], len, &str[0], _TRUNCATE, lc);
    _free_locale(lc);
    ret.resize(size - 1);
#endif
    return ret;
}

2. std::wstring to std::string

2. std::wstring 到 std::string

? WideCharToMultiByteWinAPI

? WideCharToMultiByteWinAPI

? _wcstombs_s_l

? _wcstombs_s_l

std::string WidestringToString(std::wstring wstr)
{
    if (wstr.empty())
    {
        return std::string();
    }
#if defined WIN32
    int size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &wstr[0], wstr.size(), NULL, 0, NULL, NULL);
    std::string ret = std::string(size, 0);
    WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &wstr[0], wstr.size(), &ret[0], size, NULL, NULL);
#else
    size_t size = 0;
    _locale_t lc = _create_locale(LC_ALL, "en_US.UTF-8");
    errno_t err = _wcstombs_s_l(&size, NULL, 0, &wstr[0], _TRUNCATE, lc);
    std::string ret = std::string(size, 0);
    err = _wcstombs_s_l(&size, &ret[0], size, &wstr[0], _TRUNCATE, lc);
    _free_locale(lc);
    ret.resize(size - 1);
#endif
    return ret;
}

3. On windows you need to print unicode, using WinAPI.

3. 在 Windows 上,您需要使用 WinAPI 打印 unicode。

? WriteConsole

? 写控制台

#if defined _WIN32
    void WriteLineUnicode(std::string s)
    {
        std::wstring unicode = StringToWideString(s);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), unicode.length(), NULL, NULL);
        std::cout << std::endl;
    }

    void WriteUnicode(std::string s)
    {
        std::wstring unicode = StringToWideString(s);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), unicode.length(), NULL, NULL);
    }

    void WriteLineUnicode(std::wstring ws)
    {
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), ws.length(), NULL, NULL);
        std::cout << std::endl;
    }

    void WriteUnicode(std::wstring ws)
    {
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), ws.length(), NULL, NULL);
    }

4. On main program.

4. 关于主程序。

#if defined _WIN32
int wmain(int argc, WCHAR ** args)
#else
int main(int argc, CHAR ** args)
#endif
{
    std::string source = u8"üüΩωЙ你月曜日\naèé?T?л?Σ??a";
    std::wstring wsource = L"üüΩωЙ你月曜日\naèé?T?л?Σ??a";

    WriteLineUnicode(L"@" + StringToWideString(source) + L"@");
    WriteLineUnicode("@" + WidestringToString(wsource) + "@");
    return EXIT_SUCCESS;
}

5. Finally You need a powerfull and complete support for unicode chars in console.I recommend ConEmuand set as default terminal on Windows. You need to hook Visual Studio to ConEmu. Remember that Visual Studio's exe file is devenv.exe

5. 最后,您需要在控制台中对 unicode 字符提供强大而完整的支持。我推荐ConEmu在 Windows 上设置为默认终端。您需要将 Visual Studio 挂接到 ConEmu。记住 Visual Studio 的 exe 文件是devenv.exe

Tested on Visual Studio 2017 with VC++; std=c++17.

使用 VC++ 在 Visual Studio 2017 上测试;标准=c++17。

Result

结果

Result1

结果1

回答by Vizor

This solution is inspired in dk123's solution, but uses a locale dependent codecvt facet. The result is in locale encoded string instead of UTF-8 (if it is not set as locale):

此解决方案的灵感来自dk123 的解决方案,但使用了语言环境相关的 codecvt 方面。结果是语言环境编码的字符串而不是 UTF-8(如果它没有设置为语言环境):

std::string w2s(const std::wstring &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
}

std::wstring s2w(const std::string &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
}

I was searching for it, but I can't find it. Finally I found that I can get the right facet from std::localeusing the std::use_facet()function with the right typename. Hope this helps.

我正在寻找它,但我找不到它。最后我发现我可以通过std::locale使用std::use_facet()具有正确类型名的函数来获得正确的方面。希望这可以帮助。