在 Linux 中是否有任何内置函数可以将 wstring 或 wchar_t* 转换为 UTF-8?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7469296/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there any built-in function that convert wstring or wchar_t* to UTF-8 in Linux?
提问by Amir Saniyan
I want to convert wstring to UTF-8 Encoding, but I want to use built-in functions of Linux.
我想将 wstring 转换为 UTF-8 编码,但我想使用 Linux 的内置函数。
Is there any built-in function that convert wstring
or wchar_t*
to UTF-8 in Linux with simple invokation?
是否有任何内置函数可以通过简单的调用在 Linux中转换wstring
或转换wchar_t*
为 UTF-8 ?
Example:
例子:
wstring str = L"file_name.txt";
wstring mode = "a";
fopen([FUNCTION](str), [FUNCTION](mode)); // Simple invoke.
cout << [FUNCTION](str); // Simple invoke.
采纳答案by Kerrek SB
The C++ language standard has no notion of explicit encodings. It only contains an opaque notion of a "system encoding", for which wchar_t
is a "sufficiently large" type.
C++ 语言标准没有显式编码的概念。它只包含一个“系统编码”的不透明概念,它wchar_t
是一个“足够大”的类型。
To convert from the opaque system encoding to an explicit external encoding, you must use an external library. The library of choice would be iconv()
(from WCHAR_T
to UTF-8
), which is part of Posix and available on many platforms, although on Windows the WideCharToMultibyte
functions is guaranteed to produce UTF8.
要将不透明系统编码转换为显式外部编码,您必须使用外部库。选择的库是iconv()
(from WCHAR_T
to UTF-8
),它是 Posix 的一部分,可在许多平台上使用,尽管在 Windows 上,这些WideCharToMultibyte
函数保证生成 UTF8。
C++11 adds new UTF8 literalsin the form of std::string s = u8"Hello World: \U0010FFFF";
. Those are already in UTF8, but they cannot interface with the opaque wstring
other than through the way I described.
C ++ 11增加了新的UTF8文字的形式std::string s = u8"Hello World: \U0010FFFF";
。这些已经在 UTF8 中了,但是wstring
除了通过我描述的方式之外,它们无法与不透明的接口连接。
See this questionfor a bit more background.
回答by thiton
Certainly there is no function built in on Linux, because the name Linux references the kernel only, which doesn't have anything to with it. I seriously doubt that the libc that comes with gcc has such a function, and
当然,Linux 上没有内置函数,因为 Linux 名称仅引用内核,与内核无关。我严重怀疑gcc自带的libc有这样的功能,而且
$ man -k utf
supports this theory. But there are plenty of good UTF-8 libraries around. I personally recommend the iconv library for such conversions.
支持这个理论。但是周围有很多很好的 UTF-8 库。我个人推荐 iconv 库进行此类转换。
回答by David Heffernan
It's quite plausible that wcstombs will do what you need if what you actually want to do is convert from wide characters to the current locale.
如果您真正想要做的是从宽字符转换为当前语言环境,那么 wcstombs 会做您需要的事情是很有道理的。
If not then you probably need to look to ICU, boost or similar.
如果没有,那么您可能需要看 ICU、boost 或类似的。
回答by Cubbi
If/when your compiler supports enough of C++11, you could use wstring_convert
如果/当您的编译器支持足够的 C++11,您可以使用 wstring_convert
#include <iostream>
#include <codecvt>
#include <locale>
int main()
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
std::wstring str = L"file_name.txt";
std::cout << utf8_conv.to_bytes(str) << '\n';
}
tested with clang++ 2.9/libc++ on Linux and Visual Studio 2010 on Windows.
在 Linux 上使用 clang++ 2.9/libc++ 和在 Windows 上使用 Visual Studio 2010 进行测试。