C++ 将字符串（或 char）转换为 wstring（或 wchar_t）

Question

提问by Samir

string s = "おはよう";
wstring ws = FUNCTION(s, ws);

How would i assign the contents of s to ws?

我如何将 s 的内容分配给 ws？

Searched google and used some techniques but they can't assign the exact content. The content is distorted.

搜索谷歌并使用了一些技术，但他们无法分配确切的内容。内容被扭曲。

Answer 1

回答by Johann Gerell

Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.

假设您的示例中的输入字符串 (おはよう) 是 UTF-8 编码的（从它的外观来看并不是这样，但为了便于说明，我们假设它是 :-)）Unicode 字符串的表示如果您感兴趣，那么您的问题可以单独使用标准库（C++11 和更新版本）完全解决。

The TL;DR version:

TL; DR 版本：

#include <locale>
#include <codecvt>
#include <string>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(wide_utf16_source_string);
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

Longer online compilable and runnable example:

更长的在线可编译和可运行示例：

(They all show the same example. There are just many for redundancy...)

（它们都显示了相同的示例。只有许多用于冗余......）

Note (old):

注意（旧）：

As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms. For a better conversion, consider std::codecvt_utf8as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8

正如评论中指出并在https://stackoverflow.com/a/17106065/6345 中解释的那样，在某些情况下，使用标准库在 UTF-8 和 UTF-16 之间进行转换可能会导致不同平台上的结果出现意外差异. 为了更好的转换，请考虑http://en.cppreference.com/w/cpp/locale/codecvt_utf8 上的std::codecvt_utf8描述

Note (new):

注意（新）：

Since the codecvtheader is deprecated in C++17, some worry about the solution presented in this answer were raised. However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.htmlsaying

由于codecvtC++17 中不推荐使用标头，因此引发了对此答案中提供的解决方案的一些担忧。但是，C++标准委员会在http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html中添加了一个重要的声明说

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.

这个库组件应该与附件 D 一起退役，直到一个合适的替换被标准化。

So in the foreseeable future, the codecvtsolution in this answer is safe and portable.

所以在可预见的未来，codecvt这个答案中的解决方案是安全且便携的。

Answer 2

回答by Pietro M

int StringToWString(std::wstring &ws, const std::string &s)
{
    std::wstring wsTmp(s.begin(), s.end());

    ws = wsTmp;

    return 0;
}

Answer 3

回答by Potatoswatter

Your question is underspecified. Strictly, that example is a syntax error. However, std::mbstowcsis probably what you're looking for.

你的问题没有具体说明。严格来说，这个例子是一个语法错误。但是，std::mbstowcs这可能是您正在寻找的。

It is a C-library function and operates on buffers, but here's an easy-to-use idiom, courtesy of TBohne (formerly Mooing Duck):

它是一个 C 库函数并在缓冲区上运行，但这是一个易于使用的习惯用法，由 TBohne（以前称为 Mooing Duck）提供：

std::wstring ws(s.size(), L' '); // Overestimate number of code points.
ws.resize(std::mbstowcs(&ws[0], s.c_str(), s.size())); // Shrink to fit.

Answer 4

回答by Alex Che

Windows API only, pre C++11 implementation, in case someone needs it:

仅限 Windows API，C++11 之前的实现，以防有人需要它：

#include <stdexcept>
#include <vector>
#include <windows.h>

using std::runtime_error;
using std::string;
using std::vector;
using std::wstring;

wstring utf8toUtf16(const string & str)
{
   if (str.empty())
      return wstring();

   size_t charsNeeded = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), NULL, 0);
   if (charsNeeded == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   vector<wchar_t> buffer(charsNeeded);
   int charsConverted = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), &buffer[0], buffer.size());
   if (charsConverted == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   return wstring(&buffer[0], charsConverted);
}

Answer 5

回答by lmiguelmh

If you are using Windows/Visual Studioand need to convert a string to wstring you could use:

如果您使用的是Windows/ Visual Studio并且需要将字符串转换为 wstring，则可以使用：

#include <AtlBase.h>
#include <atlconv.h>
...
string s = "some string";
CA2W ca2w(s.c_str());
wstring w = ca2w;
printf("%s = %ls", s.c_str(), w.c_str());

Same procedure for converting a wstring to string (sometimes you will need to specify a codepage):

将 wstring 转换为 string 的相同过程（有时您需要指定codepage）：

#include <AtlBase.h>
#include <atlconv.h>
...
wstring w = L"some wstring";
CW2A cw2a(w.c_str());
string s = cw2a;
printf("%s = %ls", s.c_str(), w.c_str());

You could specify a codepageand even UTF8 (that's pretty nice when working with JNI/Java). A standardway of converting a std::wstring to utf8 std::string is showed in this answer.

您可以指定代码页甚至 UTF8（使用JNI/ Java时非常好）。此答案显示了将 std::wstring 转换为 utf8 std::string的标准方法。

// 
// using ATL
CA2W ca2w(str, CP_UTF8);

// 
// or the standard way taken from the answer above
#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}

If you want to know more about codepagesthere is an interesting article on Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

如果您想了解有关代码页的更多信息，请参阅Joel on Software 上的一篇有趣的文章：每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最小值。

These CA2W (Convert Ansi to Wide=unicode) macros are part of ATL and MFC String Conversion Macros, samples included.

这些 CA2W（将 Ansi 转换为 Wide=unicode）宏是ATL 和 MFC 字符串转换宏的一部分，包括示例。

Sometimes you will need to disable the security warning #4995', I don't know of other workaround (to me it happen when I compiled for WindowsXp in VS2012).

有时您需要禁用安全警告 #4995'，我不知道其他解决方法（对我而言，当我在 VS2012 中为 WindowsXp 编译时会发生这种情况）。

#pragma warning(push)
#pragma warning(disable: 4995)
#include <AtlBase.h>
#include <atlconv.h>
#pragma warning(pop)

Edit:Well, according to this articlethe article by Joel appears to be: "while entertaining, it is pretty light on actual technical details". Article: What Every Programmer Absolutely, Positively Needs To Know About Encoding And Character Sets To Work With Text.

编辑：嗯，根据这篇文章，乔尔的文章似乎是：“虽然有趣，但对实际技术细节的了解却很少”。文章：每个程序员都绝对需要了解的关于编码和字符集以处理文本的内容。

Answer 6

回答by Mark Lakata

Here's a way to combining string, wstringand mixed string constants to wstring. Use the wstringstreamclass.

这里有一个方法相结合string，wstring并混合字符串常量wstring。使用wstringstream类。

This does NOT work for multi-byte character encodings. This is just a dumb way of throwing away type safety and expanding 7 bit characters from std::string into the lower 7 bits of each character of std:wstring. This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings.

这不适用于多字节字符编码。这只是一种抛弃类型安全性并将 std::string 中的 7 位字符扩展到 std:wstring 的每个字符的低 7 位的愚蠢方法。这仅在您有 7 位 ASCII 字符串并且需要调用需要宽字符串的 API 时才有用。

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = L"wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();

Answer 7

回答by Ghominejad

From char*to wstring:

从char*到wstring：

char* str = "hello worlddd";
wstring wstr (str, str+strlen(str));

From stringto wstring:

从string到wstring：

string str = "hello worlddd";
wstring wstr (str.begin(), str.end());

Note this only works well if the string being converted contains only ASCII characters.

请注意，这仅适用于被转换的字符串仅包含 ASCII 字符的情况。

Answer 8

回答by vladon

using Boost.Locale:

使用 Boost.Locale：

ws = boost::locale::conv::utf_to_utf<wchar_t>(s);

Answer 9

回答by Matthias Ronge

This variant of it is my favourite in real life. It converts the input, if it is validUTF-8, to the respective wstring. If the input is corrupted, the wstringis constructed out of the single bytes. This is extremely helpful if you cannot really be sure about the quality of your input data.

它的这种变体是我在现实生活中最喜欢的。它将输入（如果它是有效的UTF-8）转换为相应的wstring. 如果输入损坏，则由wstring单个字节构造。如果您不能真正确定输入数据的质量，这将非常有用。

std::wstring convert(const std::string& input)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(input);
    }
    catch(std::range_error& e)
    {
        size_t length = input.length();
        std::wstring result;
        result.reserve(length);
        for(size_t i = 0; i < length; i++)
        {
            result.push_back(input[i] & 0xFF);
        }
        return result;
    }
}

Answer 10

回答by Isma Rekathakusuma

String to wstring

字符串到 wstring

std::wstring Str2Wstr(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

wstring to String

wstring 到 String

std::string Wstr2Str(const std::wstring& wstr)
{
    typedef std::codecvt_utf8<wchar_t> convert_typeX;
    std::wstring_convert<convert_typeX, wchar_t> converterX;
    return converterX.to_bytes(wstr);
}

C++ 将字符串（或 char）转换为 wstring（或 wchar_t）

提问by Samir

回答by Johann Gerell

回答by Pietro M

回答by Potatoswatter

回答by Alex Che

回答by lmiguelmh

回答by Mark Lakata

回答by Ghominejad

回答by vladon

回答by Matthias Ronge

回答by Isma Rekathakusuma

相关推荐

最近更新

标签

C++ 将字符串（或 char*）转换为 wstring（或 wchar_t*）

提问by Samir

回答by Johann Gerell

回答by Pietro M

回答by Potatoswatter

回答by Alex Che

回答by lmiguelmh

回答by Mark Lakata

回答by Ghominejad

回答by vladon

回答by Matthias Ronge

回答by Isma Rekathakusuma

相关推荐

C++ vector和deque的区别

在 C++ 中存储类型

Eclipse 上的 C++11 完全支持

C++ 如何将char *复制到字符串中，反之亦然

相关推荐

最近更新

标签

C++ 将字符串（或 char）转换为 wstring（或 wchar_t）